Bias-Variance Tradeoff

Bias and variance are prediction errors. There is a trade-off between a model’s ability to minimize bias and variance which is referred to as the best solution for selecting a value of Regularization constant.

Bias is the difference between the average prediction of the target variable and the correct value which we are trying to predict. Model with high bias pays very little attention to the training data and oversimplifies the model. It always leads to high error on training and test data.

In supervised learning, overfitting happens when the model captures the noise along with the underlying pattern in data. That means the model has large number of parameters. These models have low bias and high variance. On the other hand if the model is too simple and has very few parameters then it may have high bias and low variance.

Model with high variance pays a lot of attention to training data and does not generalize on the data which it hasn’t seen before. As a result, such models perform very well on training data but has high error rates on test data.

So we need to find the right/good balance without overfitting and underfitting the data. This trade-off in complexity is why there is a trade-off between bias and variance. An algorithm can’t be more complex and less complex at the same time.

One way of finding a good bias-variance trade-off is to tune the complexity of the model via regularisation. Regularisation is a very useful method to handle situations where there is high correlation between features.The concept behind regularisation is to introduce additional information (bias) to penalise extreme parameter (weight) values. The most common form of regularisation is so-called L2 regularisation (sometimes also called L2 shrinkage or weight decay).There is also something called as L1 regularisation.

To find an acceptable bias-variance trade-off, we need to evaluate our model carefully. The common cross-validation technique k-fold cross-validation, which can help us in telling how well the model performs on unseen data.

Comments are closed.