A/B Testing

A/B testing is a basic randomized control experiment. It is a way to compare the two versions of a variable to find out which performs better in a controlled environment. 

A/B testing is also known as bucket testing or split-run testing

Suppose we want to add some functionalities to an existing product. A/B testing is the best method for quantifying changes in that product. In order to find out the effects of the newly added functionalities on the product and also assess the difference in the performance of the older and the modified versions of the product, we conduct A/B testing.

A/B testing is a form of statistical and two-sample hypothesis testing. Statistical hypothesis testing is a method in which a sample dataset is compared against the population data. Two-sample hypothesis testing is a method in determining whether the differences between the two samples are statistically significant or not.

Typically, two consumer groups (Control and Variant) are exposed to two different versions of the same thing to see if there is a significant difference in metrics.

Steps involved in conducting A/B testing:

  • Formulate null and alternative hypothesis.

Null hypothesis: There is no difference between the two versions.

Alternative hypothesis: There is a difference between the two versions

  • Create control group and test group

There are two important concepts to consider in this step, random samplings and sample size.

Random sampling is a technique where each sample in a population has an equal chance of being chosen. Random sampling is important in hypothesis testing because it eliminates sampling bias. It is important to eliminate bias because we want the results of the A/B test to be representative of the entire population rather than the sample itself.

Also, we have to determine the minimum sample size for the A/B test prior to conducting it so that we can eliminate under coverage bias, bias from sampling too few observations.

  • Conduct the test, compare the results, and reject or do not reject the null hypothesis

There are a few steps to determine whether the difference between the control group and variant group is statistically significant.

  • First, we have to set alpha, the probability of making a type 1 error. Typically the alpha is set at 5% or 0.05
  • Next, we have to determine the probability value (p-value) by first calculating the t-statistic using the formula above.
  • Lastly, compare the p-value to the alpha. If the p-value is lesser than the alpha, then reject the null hypothesis.

The term significance level (alpha) is used to refer to a pre-chosen probability and the term p-value is used to indicate a probability that we calculate after a given study. The significance level (alpha) is the probability of type I error. The power of a test is one minus the probability of type II error (beta).

Alpha, the significance level, is the probability that we will make the mistake of rejecting the null hypothesis when in fact it is true. The p-value, or calculated probability, is the probability of finding the observed, or more extreme, results when the null hypothesis (H0) of a study question is true.

The null hypothesis is usually a hypothesis of “no difference” and the alternative hypothesis (H1) is the opposite of the null hypothesis.

If your p-value is less than the chosen significance level then you reject the null hypothesis i.e. accept that your sample gives reasonable evidence to support the alternative hypothesis. It does NOT imply a “meaningful” or “important” difference; that is for you to decide when considering the real-world relevance of your result.

The choice of significance level at which we reject H0 is arbitrary. Conventionally the 5% (less than 1 in 20 chance of being wrong), 1% and 0.1% (P < 0.05, 0.01 and 0.001) levels have been used.

Comments are closed.