仓库源文站点原文


layout: post title: "AI 笔记 Week 15 Planning under uncertainty" date: "2018-12-02 17:11:33" categories: 计算机科学 excerpt: "Link to the videosLink to the video transcripts Introduction This lectur..."

auth: conge

Link to the videos Link to the video transcripts

Introduction

This lecture will focus on marrying planning and uncertainty together to drive robots in actual physical roles and find good plans for these robots to execute.

Planning Under Uncertainty MDP

Methods categorized based on the characteristics of the world (observability and certainty).

MDP

Robot Tour Guide Examples

All these robots need to deal with uncertainties and observabilities to do their jobs (tour guide or mine explorer).

MDP Grid World

image.png

Absorbing states: search will end if the agent is at the absorbing states. Policy assign action based on the state the agent is in.

Problems With Conventional Planning 1

Policy Question

Question: what is the best action to take when an agent is in states a1, c1, c4 and b3?

MDP And Costs

The reason that the agent should be avoiding the b4 state is the cost.

Value Iteration

Intuition of VI functions

Quiz

Determistic question:

I Quiz: calculate Value when an agent was at each state.

Stochastic Question :

Calculation is complicated here because the reward of each action must be evaluated.

Value Iterations And Policy

The policy can be defined by the value function after the value of each cell is calculated. The action policy is to choose the action which leads to the highest path reward.

If the cost of each state is positive, the policy will encourage action to stay in the current state.

If the cost is too low, the value of each state might become so low that the agent will try to end the search as soon as possible without looking for an optimal solution

MDP Conclusion

Conclusion


POMDP Vs MDP

POMDP

conventional planning

MDP

POMDP

POMDP wouldn't work

POMDP would not work when there are two worlds So here's a solution that doesn't work: Obviously, the agent might be in 2 different worlds and it does not know. Solving the problem for both of these cases and then put these solutions together will not work because the average result will never let it go south to gather information.

POMDP will work

POMDP on belief states will work. If the agent goes south and reaches the sign. 50% chance it will go the right-side belief state. if MDP was performed, then it will reach the +100 state. the same will happen if it goes to left-side belief state (50% chance).

Readings on Planning under Uncertainty

AIMA: Chapter 17

Further Study

Charles Isbell and Michael Littmann’s ML course:

2018-12-01 First draft