仓库源文站点原文


layout: post title: "Reinforcement Learning 第四周课程笔记" date: "2015-09-06 12:04:19" categories: 计算机科学 excerpt: "本周三件事:看课程视频,阅读 Sutton (1988),作业3(HW3)。 以下为视频截图和笔记: Temporal Difference ..."

auth: conge

本周三件事:看课程视频,阅读 Sutton (1988),作业3(HW3)。

以下为视频截图和笔记:

Temporal Difference Learning

Read Sutton 1988 first

Three families of RL algorithms

  1. Model based
  2. Model free
  3. Policy search

TD-lambda

TD-lambda

Quiz 1: TD-lambda Example

Quiz 2: Estimating from Data

Computing Estimates Incrementally

Quiz 2: alpha will mke learning converge (tips:if 指数i大于1, 1/(T)<sup>i</sup> will be bounded

TD (1) rule

TD(1) with and without repeated states

Why TD(1) is "Wrong"

TD(0) Rule

Connecting TD(0) and TD(1)

K-Step Estimators

K-Step Estimators

K-step Estimators and TD-lambda

TD-lambda can be seen as weighted combination of K-step estimators. the weight factor are λ<sup>k</sup>(1-λ).

Why use TD-lambda?

The best performed lambda is typically not TD(0), but some λ in between 0 and 1.

Summary

2015-09-5 初稿
2015-12-03 reviewed and revised until the "Connecting TD(0) and TD(1)" slides