仓库源文站点原文


layout: post title: "Reinforcement Learning 第三周课程笔记" date: "2015-09-01 10:18:34" categories: 计算机科学 excerpt: "本周三件事:看课程视频,阅读 Littman (1996) Chapters 1-2,作业2(HW2)。 以下为视频截图和笔记: Reinfor..."

auth: conge

本周三件事:看课程视频,阅读 Littman (1996) Chapters 1-2,作业2(HW2)。

以下为视频截图和笔记:

Reinforcement Learning Basics

In RL, environment is only available to agent as percepted states (s), the agent can interact with the environment by taking action (a) and the environment gives a reward (r) as feedback to tell the agent is the <s a> pair are good or not. The computation is calculated in the agent's head.

The difference of RL and MDP is that in MDP, environment is totally available to the agent, while in RL, Environment is only available through the agent's perception.

Demo of RL

A MDP Game

The small orange square represents the agent, and it can perform 6 actions. The world is the grid with some colored squares and a green dot. The goal what's the game and what are the actions.

Behavioral structure

The goal is the generate learning algorithm

Behavioral Structors

Evaluating a policy

Quiz: evaluating a policy

Evaluating a Learner

Better Learner will get good returning policy with less time and simple data

Recap

Summary

2015-08-31 初稿
2015-12-02 reviewed and revised