Probability Distribution

A probability distribution is a way to describe how likely different outcomes are in an experiment. It tells us what outcomes are possible and how likely they are to occur. In other words, it’s a mathematical function that provides the probabilities of occurrence of different possible outcomes. Types of Probability…

Continue reading

Bootstrapping

Bootstrapping is a resampling method that involves taking repeated samples (called ‘bootstrap samples’) from a dataset with replacement. It is used to estimate the distribution of a statistic and to calculate confidence intervals and significance tests. Here is the basic procedure: Draw a Sample: Randomly select n observations from the…

Continue reading

Confidence Intervals

A confidence interval is a range of values we are fairly sure our true value lies in. It is calculated from the sample data and gives an interval estimate, as opposed to a point estimate. The confidence level, often expressed as a percentage (e.g., 95% or 99%), quantifies the level…

Continue reading

Key Statistical Tests

In the world of data science, statistical tests play a crucial role in drawing meaningful insights from data, making informed decisions, and validating hypotheses. Let’s explore four essential statistical tests: the Z-test, t-test, chi-squared test, and ANOVA. 1. Z-test: Unleash the Power of the Standard Score The Z-test is a…

Continue reading

Hypothesis Testing

Hypothesis testing is a method statisticians use to make decisions or inferences about populations based on sample data. Hypothesis testing is a core concept in statistics that allows us to make informed decisions based on data. It’s a structured, methodical way to put our claims to the test, demanding evidence…

Continue reading

A/B Testing

A/B testing is a basic randomized control experiment. It is a way to compare the two versions of a variable to find out which performs better in a controlled environment.  A/B testing is also known as bucket testing or split-run testing Suppose we want to add some functionalities to an existing product. A/B testing…

Continue reading

Central Limit Theorem

The central limit theorem (CLT) is the foundation of statistics. Just by collecting a subset of data from a population and using statistics, we can draw conclusions about that population. CLT says that mean of the sampling distribution of the sample means is equal to the population mean irrespective of the distribution…

Continue reading

Difference between Statistics and ML

Before we discuss Linear Regression for ML, let us talk a little about Statistics. Many of us might have learnt linear regression in Statistics classes. Are ML and Statistics the same? No, they are not! ML builds heavily on Statistics, that’s all! Both Machine Learning and statistical modelling learn from…

Continue reading