Differences between GD and SGD

Next: Delta Rule vs. Perceptron Up: Neural Network Learning Previous: Stochastic Gradient Descent

In GD the error is summed over all examples before updating weights, in SGD weights are updated upon examining each training example
Summing over multiple examples in GD requires more computation per weight update step. But since it uses the True gradient, it is often used with a larger step size
If there are multiple local minima with respect to , SGD can sometimes avoid falling into these local minima

Patricia Riddle
Fri May 15 13:00:36 NZST 1998