This week
- watch Game Theory.
- The readings are Littman (1994), Littman and Stone (2003), Greenwald and Hall (2003), Munoz de Cote and Littman (2008).
- Assignment 10 is up.
data:image/s3,"s3://crabby-images/c5e2a/c5e2a8de47ab7ba3f384c4207589839ac88198dc" alt="Game Theory III"
data:image/s3,"s3://crabby-images/f844c/f844c5a1221d420d6f72386eef9f1f34118cac55" alt="Definition of Game Theory"
- Game theory is mathematics of conflict of interests.
- It generalizes the RL from single agent to multiple agents.
- It is of the interest of economics, politics, sociology or even biology since these fields often deal with agents with many many agents with conflicts of interests.
Example
data:image/s3,"s3://crabby-images/63f47/63f47e0816e090722298dd3e320dc69a0ce5f937" alt="Quiz 1: simple game example"
- The example game is represented as a tree, nodes are states, edges are transitions and leafs are rewards.
- the example game has 2 agent (a and b), all the rewards added up to be a constant ( zero-sum), no stochastic transitions. The leafs show the reward that agent a can get and b gets (-1 * reward).
- Strategies: what action should the agent take at each state it could be in. (equivalent to policy in RL)
- The agents are assumed to be rational.
data:image/s3,"s3://crabby-images/d936c/d936c4f45a53bc0ac1f0f7b115546ce53a1c66d3" alt="Quiz 2: Represent the tree using a matix"
- this is a simple example of 2-agent zero-sum game
- zero-sum means the reward of A and B will sum to 0 for any strategy.
- A matrix is enough to represent a game.
- once the matrix is here, the tree doesn't matter now.
- need to learn to generate the matrix from the tree.
Minimax
data:image/s3,"s3://crabby-images/baf5b/baf5b2685ba69ecc9412ffc463942fcccc184222" alt="MiniMax"
- A and B has the same goal, maximize their own reward ( and minimize others reward).
- If A and B behave rationally, they will reach the same strategy.
data:image/s3,"s3://crabby-images/c83a0/c83a057661318948d5bfa81d33395c976ab677fb" alt="Von Neumann theorem"
This is important so I am writing it down:
- In a 2-agent, zero-sum, deterministic game of perfect information, Minimax ≡ Maximin,
- and there always exist an optimal pure strategy for each agent.
- Based on the strategy matrix, one can always build at least one tree.
Now, to make the problem a bit more complex, we change the game to be non-deterministic:
data:image/s3,"s3://crabby-images/bb23c/bb23cb552f9c0aead2bd17bc6f887407c4432b3b" alt="quiz 3: strategy matrix for non-deterministic game"
- Introduce the chance box, so that transition is non-deterministic.
- construct the strategy matrix based on tree ( but could not reconstruct tree based on matrix)
- note: from the matrix, we don't know if the tree is deterministic or not.
- the minimax theorem (Von Neumann theorem) still holds
- Minimax ≡ Maximin
- Optimal pure strategy exists.
Minipoker
data:image/s3,"s3://crabby-images/9f371/9f371ec7256a12afe60f63a199fc73125cd41cc6" alt="Mini-Poker: description"
data:image/s3,"s3://crabby-images/86b9e/86b9e6b3004e8400b1c40a7fa7a451df395f8b1b" alt="Mini-Poker: Tree and Matrix"
- In the minipoker game, b will not know all the information, so it's a 2-agent, zero-sum, non-deterministic game of hidden information
- In this case, Minimax ≠ Maximin and there will be no optimal pure strategy.
- but there will be mixed Strategy, which is a distribution of strategies.
Mixed Strategy
data:image/s3,"s3://crabby-images/01ee9/01ee9a85d5bb5e798ced3e43fcfdce5e5c78a29a" alt="Quiz 5: Given B's strategy, we can figure out A's expected profit"
- A's expected strategy are linear equation, which can be represented by lines.
data:image/s3,"s3://crabby-images/7ac2d/7ac2d32617506eb10b96824bafa5c2c3c854afa4" alt="Quiz 6: A's expected value is dependent on B"
- the mixed strategy should be at the intercept of the two lines in this case.
- if the two lines both have positive slope, the mixed strategy should be at the far right; if negative slope, the strategy should be at the far left.
- the expected value of the game is deterministic still.
- although the strategy is mixed, there is still an expected value, in this case, the expected value is when p is 0.4, and value is 1.
Snitch
data:image/s3,"s3://crabby-images/6ee8d/6ee8de75e4068f31e2f81515f1a07b7dc2801cf0" alt="Prisonders' Dilemma"
- Now, we are making the game non-zero-sum.
- The prisoner's dilemma is a 2-agent, non-zero-sum, non-deterministic game of hidden information
- Assume the agents are rational, both of them should defect.
A Beautiful Equilibrium
data:image/s3,"s3://crabby-images/c023a/c023ae4e3fcbf698be71fd2e6023bb1ed3601dbb" alt="Nash Equilibrium"
- in practice, if we are in a Nash Equilibrium, any agent will have no good reason to change strategy ( pure or mixed).
data:image/s3,"s3://crabby-images/bc1b7/bc1b75ab988c8a69158bc1ab5f1f0f82479bd285" alt="Theorem"
- in the n-player pure strategy game, if elimination of strictly dominated strategies eliminates all but one combination, that combination is the unique NE.
- Any N.E. will survive elimination of strictly dominated strategies
- if n is finite, for each set of finite strategies, then there will be at least one strategy is N.E.
data:image/s3,"s3://crabby-images/70db2/70db262b0e692c979fa326a69e3046a276f71489" alt="Play the game multiple times: won't change NE"
- what if playing the game multiple times?
- only the last game matters-> the last game will be N.E -> since the last game is known now, the last game moves to the game before it.-> the last game will be N.E. -> repeat.... ->all the game should will be N.E.
Recap
data:image/s3,"s3://crabby-images/414ea/414ea9e9ee3b0a98a98d729e5a2889dc08c7e2cd" alt="Recap"
Andrew Moore's slides on Zero-Sum Games
Andrew Moore's slides on Non-Zero-Sum Games
2015-11-03 初稿
2016-04-26 复习并添加部分内容