Generalizing Generalization

Things to make RL hard

 Temporal Abstraction 

Temporal Abstraction

Temporal Abstraction Options


Temporal Abstraction Option Function

Temporal Abstraction Option Function

 Pac-Man Problems

Quiz 1: Pac-Man Problems

Quiz 1 solution Pac-Man Problems

How It Comes Together

Goal Abstraction

goal abstraction

Goal Abstraction

 Monte Carlo Tree Search

Monte Carlo Tree Search

In the figure above, circles are states, edges are transitions. π =Q^(s,a) is the policy of the known part of the tree. In these states, we know what action to take following π (pink edges). When reach an unknown state, we apply the rollout policy π<sub>r</sub>, and simulate actions to take deep in the tree, and then we backup and update π<sub>r</sub> and π to figure out what to select at each state, including the unknown state where we started the simulation. π gets expanded as we figure out the policy at unknown state. Then repeat the "Select, Expand, simulate, back up" process.

Monte Carlo Tree Search

MCTS Properties

MCTS Properties



