仓库源文站点原文


layout: post title: "ML4T 笔记 | 03-05 Reinforcement learning" date: "2019-03-25 03:25:25" categories: 计算机科学 auth: conge

tags: Machine_Learning Trading OMSCS

01 - Overview

Shortcomings of learners that provide forecast price changes.

Reinforcement learners create policies that provide specific direction on which action to take.

Time: 00:00:28

02 - The RL problem

Reinforcement learning describes a problem, not a solution.

the sense, think, act cycle Robot interacts with the environment by sensing the environment, reason over what it sees and taking actions. The actions will change the environment and then the robot sense the environment again...

In reinforcement learning,

how do we arrive at this Policy?

Again:

Now in terms of trading,

Time: 00:03:56

03 - Trading as an RL problem

Consider buy, sell, holding long, Bollinger value, return from trade, daily return, are they state, action or reward?

Consider each of these factors.

Time: 00:01:07

04 - Mapping trading to RL

How to learn how to trade a particular stock.

The policy tells us what to do at each state. learn the policy by looking at how we accrue money or don't base on the actions we take in the environment.

Time: 00:01:51

05 - Markov decision problems

MDP

a Markov decision problem are defined by.

__The problem for a reinforcement learning algorithm is to find this policy $\Pi$ or $\Pi^*$ that will maximize reward over time.

When T and R are known, the algorithms that will find this optimal policy are policy iteration and value iteration.

Time: 00:02:23

06 - Unknown transitions and rewards

When the transition function and the reward function are not available: The agent can interact with the world, observe what happens, and work with that data to try to build a policy.

Time: 00:02:55

07 - What to optimize

For example

Remember, in investment, long term reward should be discounted. (e.g. $1 per day worth than $1 in the future).

The maze problem:

Discounted reward.

Time: 00:06:32

08 - Which approach gets 1M

Time: 00:00:21 Soluiton: see the figure above.

Time: 00:01:11

11 - Summary

recap

=infinite horizon, fixed horizon, or discounted sum.

Map our task for trading to reinforcement learning and it works out like this.

Time: 00:01:49

Total Time: 00:23:20

2019-03-25 初稿