# Principal Component Analysis

In most of the datasets, the majority of the features are correlated. The higher the number of features, the harder it gets to visualize the training set and then work on it. Dimensionality reduction is the process of reducing the number of random variables by obtaining a set of principal variables.

Principal Component Analysis (PCA) is a statistical technique used for dimensionality reduction in machine learning. Reducing the number of input variables for a predictive model is referred to as dimensionality reduction.  PCA can be thought of as an unsupervised learning problem.

There are mainly two different ways to achieve dimensionality reduction:

• Feature Elimination
• Feature Extraction

In feature elimination, we drop all variables except the ones we think will best predict the target variable. The advantage of the feature elimination methods include simplicity and maintaining the interpretability of your variables. But the disadvantage that we lose information from those variables we have dropped.

The Principal Component Analysis(PCA) is a technique of feature extraction. It is a way of reducing the dimensions of a given dataset by extracting new features from the original features present in the dataset. That means it combines the input variables(or features) in a specific way and gives “newfeatures by retaining the most valuable information of all the original features. The “new” variables after PCA are all independent of one another.

Assumptions

PCA is based on the Pearson correlation coefficient framework and inherits similar assumptions.

1. Sample size: Minimum of 150 observations and ideally a 5:1 ratio of observation to features.
2. Correlations: The feature set is correlated, so the reduced feature set effectively represents the original data space.
3. Linearity: All variables exhibit a constant multivariate normal relationship, and principal components are a linear combination of the original features.
4. Outliers: No significant outliers in the data as these can have a disproportionate influence on the results.
5. Large variance implies more structure: high variance axes are treated as principal components, while low variance axes are treated as noise and discarded.

Mathematics behind PCA

The whole process of obtaining principle components from a raw dataset can be simplified into six parts :

• Take the whole dataset consisting of d+1 dimensions and ignore the labels such that our new dataset becomes d dimensional.
• Compute the mean for every dimension of the whole dataset.
• Compute the covariance matrix of the whole dataset.
• Compute eigenvectors and the corresponding eigenvalues.
• Sort the eigenvectors by decreasing eigenvalues and choose k eigenvectors with the largest eigenvalues to form a d × k dimensional matrix W.
• Use this d × k eigenvector matrix to transform the samples onto the new subspace.

[An eigenvector is a vector whose direction remains unchanged when a linear transformation is applied to it.]