Topic 11 Revisiting Old Tools
11.1 Discussion
Logistic regression is fit with a technique called maximum likelihood. What is this technique?
Example: flipping a coin 3 times, unknown probability p of getting heads
- The data: 2 heads, 1 tail
- 3 ways that could have happened:
T, H, H
H, T, H
H, H, T - Each way has probability p2(1−p)
- The probability of seeing my data (as a function of p) is 3p2(1−p)
- Goal: Find the p that maximizes that probability
- The 3 doesn’t matter for this, so we usually remove such constant terms.
- p2(1−p) is the likelihood function.
- Imagine (in a non-coin flipping situation) that we wanted p to depend on predictors…can we relate this back to logistic regression?
- Yes!
p=eβ0+β1x1+⋯+βpxp1+eβ0+β1x1+⋯+βpxp - The value of the likelihood function at the best p is a number: the likelihood of the data
Connecting to stepwise selection
- For logistic regression, stepwise selection is exactly the same, except that likelihood is used instead of RSS.
- Cross-validated accuracy is an option too.
Connecting to GAMs
- There can be nonlinear relationships between quantitative predictors and log odds as well.
- e.g. log odds(foreclosure)=β0+f1(Age)+f2(Price)
- Build f1 and f2 from LOESS or splines
- If relationships are truly nonlinear, should help us improve prediction accuracy.
11.2 Exercises
- Consider how LASSO would be extended to the logistic regression setting. Using the penalized least squares criterion as a reference, how would you write a penalized criterion for logistic regression using the likelihood?
- On a different note unrelated to likelihood, consider the KNN algorithm for regression. How would you modify the algorithm to…
- make a hard classification?
- produce a “soft” classification? A “soft” classification for a test case is an estimated probability of being in each of the K classes.
A concrete example to frame your answers: say that for a particular test case, you found its 10 nearest neighbors in the training set. 5 were of Class A, 3 of Class B, and 2 of Class C.