 
  
  
  
  
 Next: Experimentation Strategies
Up: Reinforcement Learning
 Previous: Illustrative Example
 
-    values never decrease during training - values never decrease during training -  
-    will remain in the interval between 0 and will remain in the interval between 0 and - -  
-  will converge if 
-  deterministic MDP,
-  immediate rewards are bounded -    
-  the agent selects actions such that it visits every state-action
pair infinitely often - must execute   from from with nonzero
frequency as the length of its action sequence approaches infinity with nonzero
frequency as the length of its action sequence approaches infinity
 
 
Patricia Riddle 
Fri May 15 13:00:36 NZST 1998