Coefficient of Correlation: It is the degree of relationship between two variables. Any two variables in this universe can be argued to have a correlation value. If they are not correlated then the correlation value can still be computed which would be 0. The correlation value always lies between -1 and 1 (going through 0 – which means no correlation at all – perfectly not related). 1 indicates that the two variables are moving in unison. They rise and fall together and have a perfect correlation. -1 means that the two variables are in perfect opposites. One goes up and the other goes down, in a perfect negative way.
Coefficient of determination: It is the square of Coefficient of Correlation and it shows percentage variation in the target variable(y)y which is explained by all the predictor variables(X) together. It is always between 0 and 1. R square gives the proportion of the variance (fluctuation) of one variable that is predictable from the other variable. It is a measure that allows us to determine how certain one can be in making predictions from a certain model.
The coefficient of determination is a measure of how well the regression line represents the data. If the regression line passes exactly through every point on the scatter plot, it would be able to explain all of the variations. The further the line is away from the points, the less it is able to explain. So higher the value better the model.
Correlation can be rightfully explained for simple linear regression – because you only have one x and one y variable. For multiple linear regression, R is computed, but then it is difficult to explain because we have multiple variables involved here. So we have to use R square. We can explain R square for both simple linear regressions and also for multiple linear regressions.
Why do we take the squared differences and simply not the absolute differences? Because the squared differences make it easier to derive a regression line. Indeed, to find that line we need to compute the first derivative of the Cost function, and it is much harder to compute the derivative of absolute values than squared values. Also, the squared differences increase the error distance, thus, making the bad predictions more pronounced than the good ones.
Happy Learning!