ROC Curve and AUC

ROC curves and AUC are used to measure performance in machine earning. They are the most widely used evaluation metrics for checking any classification model’s performance. It tells how much the model is capable of distinguishing between classes.

ROC (Receiver Operator Characteristic Curve) is a probability curve and AUC represents the degree or measure of separability. Higher the AUC, the better the model is at predicting 0s as 0s and 1s as 1s.

ROC curves in logistic regression are used for determining the best cutoff value for predicting whether a new observation is a “failure” (0) or a “success” (1). It is generated by plotting the True Positive Rate (y-axis) against the False Positive Rate (x-axis).

There is a tradeoff between the True Positive Rate and the False Positive Rate, or simply, a tradeoff between sensitivity and specificity. When you plot the true positive rate against the false positive rate, you get a graph which shows the trade-off between them and this curve is known as the ROC curve.

If false negatives are worse than false positives, then choose a cutoff with high sensitivity (a value higher on the Y axis ofthe ROC graph). Alternatively, if false positives are worse, then pick a cutoff with high specificity (values to the left in the ROC graph).

For a completely random model, the ROC curve will pass through the 45-degree line that has been shown in the graph above and in the best case it passes through the upper left corner of the graph. So the least area that an ROC curve can have is 0.5, and the highest area it can have is 1.

The ROC curve shows the trade-off between True Positive Rate and False Positive Rate which essentially can also be viewed as a trade-off between Sensitivity and Specificity. As you can see, on the Y-axis, you have the values of Sensitivity and on the X-axis, you have the value of (1 – Specificity).

Notice that in the curve when Sensitivity is increasing, (1 – Specificity), And since, (1 – Specificity) is increasing, it simply means that Specificity is decreasing.  That means if you increase sensitivity, the specificity will reduce.  The optimal cut-off point exists where the values of accuracy, sensitivity, and specificity are fairly decent and almost equal.

AUC stands for Area under the curve. AUC gives the rate of successful classification by the logistic model. The AUC makes it easy to compare the ROC curve of one model to another.

The area under the curve tells you how good a model is. If the curve is more towards the top-left corner, area is more, and hence, the model is better. As you can see, of the three curves, curve ‘C’ is most towards the top-left corner and thus, has the highest area resulting in it being the best model.

AUC and ROC are important evaluation metrics for calculating the performance of any classification model’s performance.