 
  
  
  
  
 Next: Heuristics to Overcome Local 
Up: Neural Network Learning
 Previous: Arbitrary Acyclic Networks
 
-  the error surface in multilayer neural networks may contain may
different local minima where gradient descent can become trapped
-   but Backpropagation is a highly effective function
approximation in practice - why?
-  networks with large numbers of weights correspond to error
surfaces in very high dimensional spaces
-  when gradient descent falls into a local minima with respect to
one weight it won't necessarily be with respect to the other weights
-  the more weights, the more dimensions that might provide an
escape route - do I believe this??? - new seed, more nodes, more data???
-  During early GD search the network will represent a very smooth
function, only after weights have time to grow will they reach a point
where they can represent highly nonlinear network functions
-  One might expect more local minima to exist in a region of the
weight space that represents these more complex functions
-  One hopes that by the time the weights reach this point they
have already moved close enough to the global minimum that even a
local minima  in this region is acceptable
 
Patricia Riddle 
Fri May 15 13:00:36 NZST 1998