仓库源文站点原文


layout: post title: "Reinforcement Learning 第十一周课程笔记" date: "2015-11-02 10:58:24" categories: 计算机科学 excerpt: "This week watching Options. The readings are Sutton, Precup, Singh (1..."

auth: conge

This week

Generalizing Generalization

Things to make RL hard

 Temporal Abstraction 

Temporal Abstraction

Temporal Abstraction Options

Options

Temporal Abstraction Option Function

Temporal Abstraction Option Function

 Pac-Man Problems

Quiz 1: Pac-Man Problems

Quiz 1 solution Pac-Man Problems

How It Comes Together

Goal Abstraction

goal abstraction

Goal Abstraction

 Monte Carlo Tree Search

Monte Carlo Tree Search

In the figure above, circles are states, edges are transitions. π =Q^(s,a) is the policy of the known part of the tree. In these states, we know what action to take following π (pink edges). When reach an unknown state, we apply the rollout policy π<sub>r</sub>, and simulate actions to take deep in the tree, and then we backup and update π<sub>r</sub> and π to figure out what to select at each state, including the unknown state where we started the simulation. π gets expanded as we figure out the policy at unknown state. Then repeat the "Select, Expand, simulate, back up" process.

Monte Carlo Tree Search

MCTS Properties

MCTS Properties

Recap

recap

2015-10-28 初稿
2015-11-01 finished.
2015-12-04 reviewed and revised