Bias-Variance Tradeoff

In machine learning, bias, and variance are two critical sources of errors in models.

1. Bias:

  • Definition: Bias is the error due to overly simplistic assumptions in the learning algorithm. High bias can cause the algorithm to miss the relevant relations between features and target outputs (underfitting), thereby leading to poor performance on both the training and testing data.
  • Example: Imagine you are trying to predict the price of houses based on their size (in square feet), and you decide to use a simple linear regression model. However, the true relationship between size and price is actually more complex (e.g., a quadratic relationship). In this scenario, your linear model may not capture the true relationship well. It assumes that the relationship between size and price is a straight line, which is a significant simplification. This is an example of high bias.

2. Variance:

  • Definition: Variance is the error due to too much complexity in the learning algorithm. High variance can cause overfitting, which means the algorithm models the random noise in the training data, rather than the intended outputs. This leads to great performance on the training data but poor performance on unseen/test data.
  • Example: Continuing with the house price prediction, suppose you now decide to use a polynomial regression model of degree 20. This model is extremely flexible and can capture very complex relationships between size and price. However, it might adapt too much to the training data, capturing the noise and fluctuations, making the model perform poorly on new, unseen data. This is an example of high variance.

Bias and variance are prediction errors. There is a trade-off between a model’s ability to minimize bias and variance which is referred to as the best solution for selecting a value of Regularization constant.

In supervised learning, overfitting happens when the model captures the noise along with the underlying pattern in data. That means the model has a large number of parameters. These models have low bias and high variance. On the other hand, if the model is too simple and has very few parameters then it may have high bias and low variance.

A Model with high variance pays a lot of attention to training data and does not generalize on the data which it hasn’t seen before. As a result, such models perform very well on training data but have high error rates on test data.

So we need to find the right/good balance without overfitting and underfitting the data. This trade-off in complexity is why there is a trade-off between bias and variance. An algorithm can’t be more complex and less complex at the same time.

One way of finding a good bias-variance trade-off is to tune the complexity of the model via regularisation. Regularisation is a very useful method to handle situations where there is a high correlation between features. The concept behind regularisation is to introduce additional information (bias) to penalize extreme parameter (weight) values. The most common form of regularisation is so-called L2 regularisation (sometimes also called L2 shrinkage or weight decay). There is also something called L1 regularisation.

To find an acceptable bias-variance trade-off, we need to evaluate our model carefully. The common cross-validation technique k-fold cross-validation, can help us in telling how well the model performs on unseen data.

Comments are closed.