Topic 13 Evaluating Classification Models (Part 1)
Learning Goals
- Calculate (by hand from confusion matrices) and contextually interpret overall accuracy, sensitivity, and specificity
- Construct and interpret plots of predicted probabilities across classes
- Explain how a ROC curve is constructed and the rationale behind AUC as an evaluation metric
- Appropriately use and interpret the no-information rate to evaluate accuracy metrics
Slides from today are available here.
Exercises
No R work today - exercises are just conceptual.
Exercise 1: Revisiting LASSO
LASSO for the logistic regression setting works analogously to the regression setting in terms of its penalization of predictors.
What would you expect about the number of predictors in the model for high vs. low lambda?
Suppose that a LASSO logistic model has been built. Describe the steps required to make a soft (probability) and hard (class) prediction for a test case.
Why should we not solely rely on the default probability threshold of 0.5 in making predictions? Thus what is the motivation for using the ROC_AUC metric? (Make sure that you can describe how the ROC curve is constructed and what the optimal value of ROC_AUC is.)
How would you expect a plot of test ROC_AUC vs. \(\lambda\) to look, and why? (Draw it!)
Exercise 2: Revisiting KNN
The KNN algorithm has a natural extension in the classification setting. Suppose that we’re using KNN to build a spam classifier.
How do we find the nearest neighbors for a test case? Why might it be a sensible idea to normalize all quantitative predictors to have a standard deviation of 1?
Suppose that the 5 nearest neighbors for a given test case have outcome values:
spam, spam, not_spam, spam, not_spam
. How might we make a soft (probability) and hard (class) prediction for this test case?How would you expect a plot of test overall accuracy vs. # neighbors to look, and why? (Draw it!)
Exercise 3: Confusion matrix computations
We obtain the following confusion matrix after using a “trained” KNN model on a test dataset.
Truth
Not spam Spam
---------- ------
Not spam | 2670 220
Predicted Spam | 118 1593
- Compute and interpret the following metrics: overall accuracy, sensitivity, specificity.
- Compute the no-information rate and use it to give context for how “high” the overall accuracy is.
- When building a spam classifier, which of overall accuracy, sensitivity, or specificity do you think matters most and why?
Exercise 4: Connecting ROC curves with predicted probability boxplots
Suppose that Model 1 has an ROC_AUC of 0.85 and that Model 2 has an ROC_AUC of 0.6. What would you expect the predicted probability boxplots (predicted probabilities vs. true class) to look like for these two models, and why? (Draw them.)