Topic 11 Revisiting Old Tools

11.1 Discussion

Logistic regression is fit with a technique called maximum likelihood. What is this technique?

Example: flipping a coin 3 times, unknown probability \(p\) of getting heads

  • The data: 2 heads, 1 tail
  • 3 ways that could have happened:
    T, H, H
    H, T, H
    H, H, T
  • Each way has probability \(p^2(1-p)\)
  • The probability of seeing my data (as a function of \(p\)) is \(3p^2(1-p)\)
  • Goal: Find the \(p\) that maximizes that probability
    • The 3 doesn’t matter for this, so we usually remove such constant terms.
    • \(p^2(1-p)\) is the likelihood function.
    • Imagine (in a non-coin flipping situation) that we wanted \(p\) to depend on predictors…can we relate this back to logistic regression?
    • Yes!
      \[p = \frac{e^{\beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p}}{1 + e^{\beta_0 + \beta_1 x_1 + \cdots + \beta_p x_p}}\]
    • The value of the likelihood function at the best \(p\) is a number: the likelihood of the data




Connecting to stepwise selection

  • For logistic regression, stepwise selection is exactly the same, except that likelihood is used instead of RSS.
  • Cross-validated accuracy is an option too.




Connecting to GAMs

  • There can be nonlinear relationships between quantitative predictors and log odds as well.
  • e.g. \(\text{log odds(foreclosure)} = \beta_0 + f_1(\text{Age}) + f_2(\text{Price})\)
    • Build \(f_1\) and \(f_2\) from LOESS or splines
  • If relationships are truly nonlinear, should help us improve prediction accuracy.




11.2 Exercises

  1. Consider how LASSO would be extended to the logistic regression setting. Using the penalized least squares criterion as a reference, how would you write a penalized criterion for logistic regression using the likelihood?




  1. On a different note unrelated to likelihood, consider the KNN algorithm for regression. How would you modify the algorithm to…
    • make a hard classification?
    • produce a “soft” classification? A “soft” classification for a test case is an estimated probability of being in each of the \(K\) classes.
      A concrete example to frame your answers: say that for a particular test case, you found its 10 nearest neighbors in the training set. 5 were of Class A, 3 of Class B, and 2 of Class C.