Ridge and Lasso Regression

Regularization is a process used to create an optimally complex model. A model should be as simple as possible. Ridge and Lasso regression are some of the simple techniques to reduce model complexity and prevent over-fitting which may result from simple linear regression.

Linear regression is the simplest supervised machine learning algorithm. Ridge and Lasso are two special linear regression models used for regularisation.

Consider a simple linear regression

y=mx+c

where m is the slope and c is the intercept. Our aim is to optimize m and c so that we can reduce the cost function.

A regression model that uses L1 regularization technique is called Lasso Regression and the model which uses L2 is called Ridge Regression.

The key difference between these two is the penalty term. In ridge regression, the cost function is altered by adding a penalty equivalent to the square of the magnitude of the coefficients.

In Lasso regression, instead of taking the square of the coefficients, magnitudes are taken into account.  Lasso regression can result in feature selection whereas Ridge regression only reduces the coefficients close to zero, but not zero.  So Lasso regression not only helps in reducing over-fitting but also helps in feature selection as well.

How it Works:

In linear regression, without regularization, we try to find the line (or in general, a hyperplane) that minimizes the sum of squared differences (errors) between our predictions and the actual values. We represent this mathematically as minimizing a loss function:

L(y,y^​)=∑(yi​−y^​i​)2

Now, with regularization, we add a penalty term to this loss function. There are two main types of regularization:

  1. L1 Regularization (Lasso Regression):
    • The penalty is the sum of the absolute values of the model parameters.
    • Mathematically: L(y,y^​)+λ∑∣wi​∣
    • This can lead to some model parameters being exactly zero, effectively leading to feature selection.
  2. L2 Regularization (Ridge Regression):
    • The penalty is the sum of the squares of the model parameters.
    • Mathematically: L(y,y^​)+λwi2​
    • This tends to shrink the parameters but doesn’t generally force them to be exactly zero.

Here, wi​ are the parameters of your model, λ is the regularization strength (a hyperparameter that you can tune), L(y,y^​) is the original loss function, y is the true output, and y^​ is the model’s prediction.

Tuning Regularization:

  • The λ in the equations is like a knob you can turn to adjust the amount of regularization you want to apply:
    • If λ=0, there is no regularization, and you might risk overfitting if your model is too complex.
    • If λ is very large, the penalty for complex models is also very large, and you might end up with an overly simple model that underfits the data.

Why is Regularization Useful?

  • It discourages overly complex models which overfit the training data.
  • It can help with feature selection (especially L1 / Lasso).
  • It often results in more generalizable models — that is, models that perform better on unseen data.

L1 regularization is valuable when you suspect that many features are irrelevant or redundant, and you want to automatically identify and remove them. It’s also helpful when you want a sparse model with only a subset of features.

L2 regularization is beneficial when you want to prevent overfitting but don’t necessarily want to perform feature selection. It helps in controlling the magnitude of coefficients, making the model more stable and less sensitive to outliers.

So, in simple terms, regularization is a technique used in machine learning to prevent overfitting by adding a penalty to the loss function, which discourages overly complex models. It encourages the model to be simpler and more general.

These regression techniques are best suited when we are dealing with a large set of features. Traditional methods like cross-validation and stepwise regression to handle overfitting and perform feature selection work well with a small set of features.

Comments are closed.