- To understand the gradient descent algorithm, it is helpful to visualize the entire hypothesis space of possible weight vectors and their associated E values as shown in below figure.
- Here the axes w0 and wl represent possible values for the two weights of a simple linear unit. The w0, wl plane therefore represents the entire hypothesis space.
- The vertical axis indicates the error E relative to some fixed set of training examples.
- The arrow shows the negated gradient at one particular point, indicating the direction in the w0, wl plane producing steepest descent along the error surface.
- The error surface shown in the figure thus summarizes the desirability of every weight vector in the hypothesis space
- Given the way in which we chose to define E, for linear units this error surface must always be parabolic with a single global minimum
Gradient descent search determines a weight vector that minimizes E by
starting with an arbitrary initial weight vector, then repeatedly modifying it
in small steps.
At each step, the weight vector is altered in the direction that produces
the steepest descent along the error surface depicted in above figure. This
process continues until the global minimum error is reached.
0 Comments