In most of the datasets, the majority of the features are correlated. The higher the number of features, the harder it gets to visualize the training set and then work on it. Dimensionality reduction is the process of reducing the number of random variables by obtaining a set of principal variables.

Principal Component Analysis (PCA) is a statistical technique used for dimensionality reduction in machine learning. Reducing the number of input variables for a predictive model is referred to as dimensionality reduction. PCA can be thought of as an unsupervised learning problem.

There are mainly two different ways to achieve dimensionality reduction:

- Feature Elimination
- Feature Extraction

In feature elimination, we drop all variables except the ones we think will best predict the target variable. The advantage of the feature elimination methods include simplicity and maintaining the interpretability of your variables. But the disadvantage that we lose information from those variables we have dropped.

The Principal Component Analysis(PCA) is a technique of feature extraction. It is a way of reducing the dimensions of a given dataset by extracting new features from the original features present in the dataset. That means it combines the input variables(or features) in a specific way and gives “new*” *features by retaining the most valuable information of all the original features. The “new” variables after PCA are all independent of one another.

**Assumptions**

PCA is based on the **Pearson correlation coefficient** framework and inherits similar assumptions.

Minimum of 150 observations and ideally a 5:1 ratio of observation to features.*Sample size:*The feature set is correlated, so the reduced feature set effectively represents the original data space.*Correlations:*All variables exhibit a constant multivariate normal relationship, and principal components are a linear combination of the original features.*Linearity:*No significant outliers in the data as these can have a disproportionate influence on the results.*Outliers:*high variance axes are treated as principal components, while low variance axes are treated as noise and discarded.*Large variance implies more structure:*

**Mathematics behind PCA**

The whole process of obtaining principle components from a raw dataset can be simplified into six parts :

- Take the whole dataset consisting of
*d+1 dimensions*and ignore the labels such that our new dataset becomes*d dimensional.* - Compute the
*mean*for every dimension of the whole dataset. - Compute the
*covariance matrix*of the whole dataset. - Compute
*eigenvectors*and the corresponding*eigenvalues*. - Sort the eigenvectors by decreasing eigenvalues and choose k eigenvectors with the largest eigenvalues to form a
*d × k dimensional*matrix**W.** - Use this
*d × k eigenvector matrix*to transform the samples onto the new subspace.

[*An eigenvector is a vector whose direction remains unchanged when a linear transformation is applied to it.*]