Games e ia
Outline
Markov decision processes Definition Policy evaluation in MDPs Policy optimization in MDPs Value iteration Policy iteration Examples Games Policy evaluation in games Policy optimization in games Minimax Alphabeta pruning Evaluation functions
Games: nondeterministic state space models
Markov decision processes Definition Policy evaluation in MDPs Policy optimization in MDPs Value iteration Policy iteration Examples Games Policy evaluation in games Policy optimization in games Minimax Alphabeta pruning Evaluation functions
CS221: Artificial Intelligence (Autumn 2012) Percy Liang
CS221: Artificial Intelligence (Autumn 2012) Percy Liang
1
So far: deterministic state space models
Model: F B
2 5 1
The real world
The dynamics of the real world are not known... Robotics: decide where to move, but actuators can fail, hit unseen obstacles, etc.
S
A
3
D 2
1
C 3 2 7 E
G
Resource allocation: decide what to produce, don't know the customer demand for various products Agriculture: decide what to plant, but don't know weather and thus crop yield How to maximize utility in these situations?
When agent takes action in state , will end up in state deterministically.
CS221: Artificial Intelligence (Autumn 2012) Percy Liang
2
CS221: Artificial Intelligence (Autumn 2012) Percy Liang
3
Game 1
You choose stay or quit. If quit, you get and we stop. If stay, you get . Then I roll a 6sided dice. If dice results in 1 or 2, we stop. Otherwise, you get to play again. Suppose utility is money. Let's play! Clear Outcome: Stay Quit Utility: 0
Computing the optimal solution in stay: $4 (2/3) quit: $10 (1/3) ?
out
Let be the expected utility if we always take action (in state "in").
+
CS221: Artificial Intelligence (Autumn 2012) Percy Liang
4
CS221: Artificial Intelligence (Autumn 2012) Percy Liang
5
Markov property and states
What about strategy