 
  
  
  
  
 Next: Reinforcement Learning Problems
Up: Reinforcement Learning
 Previous: Agents
 
-  Learning to control sequential processes - manufacturing optimization problems where reward is
goods-produced minus costs involved
-  sequential scheduling - choosing which taxis to send for
passengers in a big city where reward is a function of the wait time
of passengers and the total fuel costs of the taxi fleet
-  Specific settings: actions are deterministic or
nondeterministic, agent does or does not have prior knowledge of the
effects of its actions on the environment
 
Patricia Riddle 
Fri May 15 13:00:36 NZST 1998