Header Ads Widget

Generalization

What is generalization?

If training loss in fact does decrease as expected, it doesn’t automatically mean that whatever the model has learned is also useful. This is where the validation loss comes into play. Things look good if the validation loss decreases alongside the training loss. In that case, the learned patterns seem to generalize to the unseen validation data. The validation loss will typically be higher than the training loss, however, since not all patterns generalize, as you can see in the following graphic.

 

If validation loss decreases as well, the learned patterns seem to generalize.

 

Bias

Bias is defined as the average squared difference between predictions and true values. It’s a measure of how good your model fits the data. Zero bias would mean that the model captures the true data generating process perfectly. Both your training and validation loss would go to zero. That is unrealistic, however, as data is almost always noisy in reality, so some bias is inevitable — called the irreducible error.

Anyway, if losses do not decrease as expected, it probably signals that the model is not a good fit for the data. It would happen, for example, if you tried to fit an exponential relationship with a linear model — it can simply not adequately capture that relationship. Just try a different, more flexible model in that case.

You may also call this underfitting, with a slightly different connotation, though. Unlike bias, underfitting would imply that the model has still capacity to learn, so you would simply train for more iterations or collect more data.

Importantly, biases may also be hidden in the training data — which is easily overlooked. Your training loss may decrease as usual in that case. Only testing on real data can reveal any such bias.

 

Variance

A model is said to have high variance if its predictions are sensitive to small changes in the input. In other words, you can think of it as the surface between the data points not being smooth but very wiggly. That is usually not what you want. High variance often means overfitting because the model seems to have captured random noise or outliers.

Like high bias and underfitting, high variance and overfitting are related as well but are still not totally equivalent in meaning.

Overfitting

At some point during the training of a model, the validation loss usually levels out (and sometimes even starts to increase again) while the training loss continues to decrease. That signals overfitting. In other words, the model is still learning patterns but they do not generalize beyond the training set (see graphic below). Overfitting is particularly typical for models that have a large number of parameters, like deep neural networks.

 

Overfitting can happen after a certain number of training iterations.

A large gap between training and validation loss is a hint that the model does not generalize well and you may want to try to narrow that gap (graphic below). The simplest solution to overfitting is early-stopping, that is to stop the training loop as soon as validation loss is beginning to level off. Alternatively, regularization may help (see below). Underfitting, on the other hand, may happen if you stop too early.

 

 Generalization is low if there is large gap between training and validation loss.

Regularization

Regularization is a method to avoid high variance and overfitting as well as to increase generalization. Without getting into details, regularization aims to keep coefficients close to zero. Intuitively, it follows that the function the model represents is simpler, less unsteady. So predictions are smoother and overfitting is less likely (graphic below). Regularization can be as simple as shrinking or penalizing large coefficients — often called weight decay. L1 and L2 regularization are two widely used methods. But you may also encounter different forms, such as dropout regularization in neural networks.

 

 Regularization can help avoid high variance and overfitting.

To sum it all up, learning is well and good, but generalization is what we really want. For that matter, a good model is supposed to have both low bias and low variance. Overfitting and underfitting should both be avoided as well. And regularization may be part of the solution to all of that.


Post a Comment

0 Comments