 
  
  
  
  
 Next: Updating Sequence
Up: Reinforcement Learning
 Previous: Convergence
 
-  Common to use probabilistic approach to selecting actions
-  Actions with higher   are assigned higher
probabilities, but every action has a non-zero probability are assigned higher
probabilities, but every action has a non-zero probability
-    is the probability of selecting action is the probability of selecting action , given the agent is in state , given the agent is in state , where , where is the
constant that determines how strongly the selection favors actions
with high is the
constant that determines how strongly the selection favors actions
with high values values
-     
-  Sometimes   is varied with the number of iterations so the
agent favors exploration during the early stages of learning , then
gradually shifts toward a strategy of exploitation. is varied with the number of iterations so the
agent favors exploration during the early stages of learning , then
gradually shifts toward a strategy of exploitation.
 
Patricia Riddle 
Fri May 15 13:00:36 NZST 1998