- If the training examples are not linearly separable, the delta rule converges toward a best-fit approximation to the target concept.
- The key idea behind the delta rule is to use gradient descent to search the hypothesis space of possible weight vectors to find the weights that best fit the training examples.
To understand the delta training rule, consider the task of training an unthresholded perceptron. That is, a linear unit for which the output O is given by
Where,
- D is the set of training examples,
- td is the target output for training example d,
- od is the output of the linear unit for training example d
- E(wvector) is simply half the squared difference between the target output td and the linear unit output od, summed over all training examples.
0 Comments