Convergence and Local Minima

Next: Heuristics to Overcome Local Up: Neural Network Learning Previous: Arbitrary Acyclic Networks

the error surface in multilayer neural networks may contain may different local minima where gradient descent can become trapped
but Backpropagation is a highly effective function approximation in practice - why?
networks with large numbers of weights correspond to error surfaces in very high dimensional spaces
when gradient descent falls into a local minima with respect to one weight it won't necessarily be with respect to the other weights
the more weights, the more dimensions that might provide an escape route - do I believe this??? - new seed, more nodes, more data???
During early GD search the network will represent a very smooth function, only after weights have time to grow will they reach a point where they can represent highly nonlinear network functions
One might expect more local minima to exist in a region of the weight space that represents these more complex functions
One hopes that by the time the weights reach this point they have already moved close enough to the global minimum that even a local minima in this region is acceptable

Patricia Riddle
Fri May 15 13:00:36 NZST 1998