仓库源文站点原文


layout: post title: "Machine Learning笔记 第16周 " date: "2016-05-01 09:24:18" categories: 计算机科学 excerpt: "Machine learning 从第10周之后我就没再更新。一个原因是自己根本没时间学习,另一个原因是剩下的部分中所有内容,都是在我去年上过的..."

auth: conge

Machine learning 从第10周之后我就没再更新。一个原因是自己根本没时间学习,另一个原因是剩下的部分中所有内容,都是在我去年上过的另外一门课 Reinforcement Learning 中讲到了。需要那些笔记的话,直接去往下面的链接。 Reinforcement Learning 第一周课程笔记 : MDP;Reinforcement Learning 第十二周课程笔记: Game Theory I;Reinforcement Learning 第十三周课程笔记: Game Theory II & III。

下面的这部分内容是RL课中没有,而本课独有的内容。

Reinforcement Learning

RL

API

Reinforcement learning history

More RL "APIs"

What do you call these?

Three ways of solving RL problems

Q-learning

With Q, we can find out U or PI without knowing transition or action. This is why Q learning works.

what Q-learning can do

Estimating Q From Transitions

Paste_Image.png

what V converges to?

V will converge to the estimated value of X when alpha satisfies: all alphas sum to infinity, but all alpha square sum to a certain number. (e.t alpha = 1/t).

Q learning proof

The first step is a bit ambiguous because Q-hat changes over time. But it works in practice.

Q learning steps

Q-learning only works if s,a visited infinitely often. and alpha<sub>t</sub> satisfy the conditions that it sums to infinity but the square of it sums to something less than infinity.

Paste_Image.png

Greedy exploration

exploration & exploitation.

Wrap up wrap up

今天就要考试了,我根本就没复习好。祝我好运吧。

2016-04-30 初稿