layout: post title: "Reinforcement Learning 第三周课程笔记" date: "2015-09-01 10:18:34" categories: 计算机科学 excerpt: "本周三件事:看课程视频,阅读 Littman (1996) Chapters 1-2,作业2(HW2)。 以下为视频截图和笔记: Reinfor..."
本周三件事:看课程视频,阅读 Littman (1996) Chapters 1-2,作业2(HW2)。
以下为视频截图和笔记:
In RL, environment is only available to agent as percepted states (s), the agent can interact with the environment by taking action (a) and the environment gives a reward (r) as feedback to tell the agent is the <s a> pair are good or not. The computation is calculated in the agent's head.
The difference of RL and MDP is that in MDP, environment is totally available to the agent, while in RL, Environment is only available through the agent's perception.
The small orange square represents the agent, and it can perform 6 actions. The world is the grid with some colored squares and a green dot. The goal what's the game and what are the actions.
The goal is the generate learning algorithm
2015-08-31 初稿
2015-12-02 reviewed and revised