Naive Bayes vs Logistic Regression

Naive Bayes is a linear classifier using Bayes Theorem and strong independence condition among features. Given a data set with n features represented by

Naive Bayes states the probability of output: Y from features F_i is,

Bayes theorem states that:

Logistic regression is a linear classification method that learns the probability of a sample belonging to a certain class. Logistic regression tries to find the optimal decision boundary that best separates the classes. It is mainly used in cases where the output is boolean. Multi-class logistic regression can be used for outcomes with more than two values.

Comparison between the two algorithms:

1. Model assumptions

  • Naive Bayes assumes all the features to be conditionally independent.
  • Logistic regression splits feature space linearly and typically works reasonably well even if some of the variables are correlated.

2. Learning mechanism

  • Naive Bayes is a generative model. That means Naive Bayes models the joint distribution of the feature X and target Y, and then predicts the posterior probability given as P(y|x). The posterior probability is the probability of event A happening given that event B has occurred.
  • Logistic regression is a discriminative model. That means Logistic regression directly models the posterior probability of P(y|x) by learning the input to output mapping by minimising the error.

3. Approach to be followed to improve model results

  • Naive Bayes: When the training data size is small relative to the number of features, the information/data on prior probabilities help in improving the results
  • Logistic regression: When the training data size is small relative to the number of features, including regularisation such as Lasso and Ridge regression can help reduce overfitting and result in a more generalised model.

Comments are closed.