Topic 14 Support Vector Machines (Part 1)

14.1 Exercises

You can download a template RMarkdown file to start from here.

NOTE: completing these exercises (Part 1 today, and Part 2 on Monday) is your Homework 6, due Wednesday, April 3 at midnight. You do not need to hand in exercises that involve drawing.




These exercises are conceptual and will guide us through thinking about the mathematical formulation of the optimization problems underlying the maximal margin classifier and the support vector classifier.

We have the following dataset of 7 cases:

##   x1 x2    y
## 1  3  4  Red
## 2  2  2  Red
## 3  4  4  Red
## 4  1  4  Red
## 5  2  1 Blue
## 6  4  3 Blue
## 7  4  1 Blue

  1. Maximal margin hyperplane (related to Exercise 3 in Section 9.7 of ISLR)
    1. Sketch the maximal margin hyperplane and provide an equation for this hyperplane in the form β0+β1x1+β2x2=0. (It’ll be easier to write this first as x2=mx1+b.)
    2. Using the equation for your hyperplane, write down a classification rule for predicting “Red” and “Blue”.
    3. Let M be the margin for this hyperplane. Draw M on your diagram.
    4. Label the support vectors for this hyperplane on your diagram.
    5. Would moving case 7 a tiny bit move the hyperplane? Why or why not?
    6. Would moving case 5 a tiny bit move the hyperplane? Why or why not?




  1. Let’s keep working with the maximal margin hyperplane you drew above. But now, one more case has been added to the previous dataset: x1 = 3, x2 = 2, y = "Red".
    1. Does your maximal margin hyperplane still perfectly separate the classes?
    2. Is there any hyperplane that can perfectly separate the classes?




Recall from the video that the optimization problem for a support vector classifier is as follows.

We want to find the coefficients β0,β1,,βp and values of the slack variables ε1,,εn in order to:

  • Maximize M (the margin)
  • With the constraints that:
    • pj=1β2j=1
    • yi(β0+β1xi1++βpxip)M(1εi) for all i=1,,n
    • εi0
    • ni=1εiC




  1. Argue that for the bigger dataset, if we use a budget C=2, the original maximal margin hyperplane from Exercise 1 would still satisfy the constraints. (It may not be the hyperplane with the biggest margin though.)




  1. Look at Figure 9.7 from ISLR. (View the linked figure–don’t read the caption in the book.) The solid black line is the hyperplane estimated via the support vector classifier optimization problem. The dashed lines are the margins.
    1. Order the panels in terms of lowest budget (C) to highest budget.
    2. How does a high budget relate to the following?
      • Tolerance for cases violating the margin (being on the wrong side)
      • Size of the margin
      • Number of support vectors
      • Variance of the hyperplane
      • Bias of the hyperplane
    3. caret and other software don’t actually tend to use the budget C as a parameter. They actually have a parameter called cost (unfortunately, also denoted with C). cost is a constant that reflects the cost to pay for a case violating the margin. How does budget relate to cost?
    4. How can we choose a good value for cost?




  1. The “model” resulting from support vector methods is the equation for a hyperplane f(x)=β0+β1x1++βpxp. For a new test case x, we normally make a classification by seeing if f(x) is greater than or less than 0. How could we modify this thinking to generate an ROC curve for an SVM?




Extra! Read Section 9.4 of ISLR to learn about how to use SVMs for more than 2 classes.