仓库源文站点原文


layout: post title: "Reinforcement Learning 第十周课程笔记" date: "2015-10-31 04:57:13" categories: 计算机科学 excerpt: "This week watch POMDPs. The reading is Littman (2009). POMDP POMDPs g..."

auth: conge

This week

Partially Observable MDPs

POMDP

POMDP definition

quiz 1

quiz 2: POMDP Example 

Solution:

Solution

State Estimation

State Estimation

 Value Iteration in POMDPs 

Math

Value Iteration in POMDPs

quiz 3: piecewise linear and convex.

piecewise linear and convex

piecewise linear and convex

algorithm

algorithm for vector purge

In the figure:

Quiz 4: Domination

Learning POMDP

RL for POMDPs

Quiz 4: relationship of POMDP and other Markov systems

 Learning Memoryless Policies (Model free RL of POMDP)

quiz 5:

验证quiz

RL as POMDP:

quiz 7:  Bayesian RL

Bayesian RL

Predictive State Representation

Predictive State Representation

quiz 8: Using belief state to figure out Predictive State

Predictive State

PSR Theorem

Why go to PSR?

 What Have We Learned?

2015-10-23 初稿
2015-11-03 补全
2015-12-04 reviewed