K Means Clustering

Clustering is one of the most common exploratory data analysis techniques used to get an intuition about the structure of the data. K-means clustering is one of the simplest and popular unsupervised machine learning algorithms.

K-Means Clustering is an algorithm that, given a dataset, will identify which data points belong to each one of the clusters. It identifies subgroups in the data such that data points in the same subgroup (cluster) are very similar while data points in different clusters are very different. 

K-means clustering is a good place to start exploring an unlabeled dataset. The K in K-Means denotes the number of clusters. This algorithm is bound to converge to a solution after some iterations.

It has 4 basic steps:

  1. Initialize Cluster Centroids
  2. Assign datapoints to Clusters
  3. Update Cluster centroids
  4. Repeat steps 2–3 until the stopping condition is met.

Through a series of iterations, the algorithm creates groups of data points referred to as clusters.

Comments are closed.