Logistic Regression

Logistic Regression is a supervised classification algorithm that is used to predict the probability of a categorical dependent variable using a given set of independent variables. It is a predictive analysis algorithm and based on the concept of probability. The most common use of logistic regression models is in binary classification problems.

Some of the examples of classification problems are Email spam or not spam, Online transactions Fraud or not Fraud, Tumor Malignant or Benign. Logistic regression transforms its output using the logistic sigmoid function to return a probability value.

Logistic regression is a special case of linear regression as it predicts the probabilities of outcome using log function.

The three main important differences between logistic and linear regression are:

1. The dependent/response (Y) variable in linear regression is continuous whereas, in logistic regression, it is categorical/ discrete type.

2. The Cost function in linear regression minimizes the error term , Sum(Actual(Y)-Predicted(Y))^2 . But logistic regression uses the maximum likelihood method for maximizing probabilities.

3. In linear regression where feature variables (X) can take any values, the output (Y) can thus be continuous from negative to positive infinity.

The Logistic Regression can be explained with Logistic function, also known as Sigmoid function. The sigmoid curve has extremely low values in the start, extremely high values in the end, and intermediate values in the middle and so it’s a good choice for modelling the value of the probability of occurrence of the target variable.

The logistic function is as defined below:

f(z)=1 / 1+e -z

Where e is the base of the natural logarithms (Euler’s number or the EXP() function in your spreadsheet) and value is the actual numerical value that you want to transform. If the value of z goes to positive infinity then the predicted value of Y will become 1 and if it goes to negative infinity then the predicted value of Y will become 0. The values of a logistic function will range from 0 to 1. The values of Z will vary from −∞ to +∞.

In general, the formula for logistic regression is given by the following expression:

 f(z)=1/(1+e−(β0+β1X1+β2X2+….+βkXk))

The conditional probability can be given as:

P(Discrete value of target variable|X1,X2,X3….Xk)

It is the probability of the target variable to take up a discrete value (either 0 or 1 in case of binary classification problems) when the values of independent variables are given.

Odds is the ratio of the probability of an event occurring to the probability of the event not occurring. The logistic model outputs the logits, i.e. log odds; and the logistic function outputs the probabilities. Logistic model=β0+β1X1+β2X2+β3X3+…+βnXn The output of the same will be logits.

Logistic function=f(z)=1/(1+e−(β0+β1X1+β2X2+β3X3+…+βnXn)) The output, in this case, will be the probabilities.

β0 is the baseline in a logistic regression model. It is the log odds for an instance when all the attributes (X1,X2,X3,…,Xn) are zero. In practical scenarios, the probability of all the attributes being zero is very low. In another interpretation, β0 is the log odds for an instance when none of the attributes is taken into consideration.

All the other Betas are the values by which the log odds change by a unit change in a particular attribute by keeping all other attributes fixed or unchanged (control variables).

Therefore, in Logistic Regression, a linear combination of inputs are mapped to the log(odds) – the output being equal to 1. That means the independent variables are linearly related to the log odds.

Types of Logistic Regression:

1. Binary Logistic Regression: The categorical response has only two 2 possible outcomes. E.g.: Spam or Not

2. Multinomial Logistic Regression: Three or more categories without ordering. E.g.: Predicting which food is preferred more (Veg, Non-Veg, Vegan)

3. Ordinal Logistic Regression: Three or more categories with ordering. E.g.: Movie rating from 1 to 5

Advantages

  • One of the simplest machine learning algorithms yet provides great efficiency.
  • Variance is low.
  • It can also used for feature extraction
  • Logistic models can be updated easily with new data using stochastic gradient descent.

Disadvantages

  • Doesn’t handle large number of categorical variables well.
  • It requires transformation of non-linear features.
  • They are not flexible enough to naturally capture more complex relationships.

Comments are closed.