Overfit AI – Page 10 – Models That Don't Break on New Data

Naive Bayes vs Logistic Regression

November 30, 2020 August 19, 2023Machine Learning

Naive Bayes is a linear classifier using Bayes Theorem and strong independence condition among features. Given a data set with n features represented by Naive Bayes states the probability of output: Y from features F_i is, Bayes theorem states that: Logistic regression is a linear classification method that learns the probability…

Naive Bayes

November 29, 2020 August 19, 2023Machine Learning, Supervised Learning

Naive Bayes is a very popular Supervised Classification algorithm. This algorithm is called “Naive” because it makes a naive assumption that each feature is independent of other features. It is near to impossible to find such data sets in real life. Bayes’ theorem is the base for Naive Bayes Algorithm….

Logistic Regression

November 20, 2020 September 19, 2023Machine Learning, Supervised Learning

Logistic Regression is a supervised classification algorithm that is used to predict the probability of a categorical dependent variable using a given set of independent variables. It is a predictive analysis algorithm and based on the concept of probability. The most common use of logistic regression models is in binary classification problems. Some…

Gradient Descent

October 30, 2020 September 19, 2023Machine Learning

Gradient Descent is an optimization algorithm used to find the values of the parameters of any function that minimizes the cost function. The average difference of the squares of all the predicted values of y and the actual values of y is called a Cost Function. It is also called…

Linear Regression

October 20, 2020 September 19, 2023Machine Learning

Linear regression is a supervised machine learning algorithm used for modeling the relationship between a dependent variable and one or more independent variables by fitting a linear equation. I would like to say it is the starting point of anyone’s ML journey! Linear regression is the simplest and most widely…

Assumptions of Linear Regression Model

October 16, 2020 August 19, 2023Machine Learning

There are the five major assumptions: 1. Linear relationship: There should be a linear and additive relationship between the dependent (Y) variable and the independent (X –> x1,x2,x3,…) variable(s). A linear relationship suggests that a change in response Y due to one unit change in x1 is constant, regardless of…

Confidence Intervals

August 1, 2020 August 19, 2023Statistics

A confidence interval is a range of values we are fairly sure our true value lies in. It is calculated from the sample data and gives an interval estimate, as opposed to a point estimate. The confidence level, often expressed as a percentage (e.g., 95% or 99%), quantifies the level…

Key Statistical Tests

July 19, 2020 August 4, 2025Statistics

In the world of data science, statistical tests play a crucial role in drawing meaningful insights from data, making informed decisions, and validating hypotheses. Let’s explore five essential statistical tests: the Z-test, t-test, chi-squared test, ANOVA, and the lesser-known but powerful Fisher’s Exact Test. 1. Z-test: Unleash the Power of…

Hypothesis Testing

July 16, 2020 February 1, 2026Statistics

Hypothesis testing is a method statisticians use to make decisions or inferences about populations based on sample data. Hypothesis testing is a core concept in statistics that allows us to make informed decisions based on data. It’s a structured, methodical way to put our claims to the test, demanding evidence…

Heteroscedasticity

June 19, 2020 August 3, 2023Supervised Learning

A random variable is said to be heteroscedastic when different subpopulations have different variabilities (standard deviation). One of the basic assumptions of linear regression is that the data should be homoscedastic, i.e., heteroscedasticity is not present in the data. Due to the violation of assumptions, the Ordinary Least Squares (OLS) estimators…

Archives