Q1. (10 points) Answer the following with a yes or no along with proper justification.
a. Is the decision boundary of voted perceptron linear?
b. Is the decision boundary of averaged perceptron linear?
Q2. (10 points) Consider the following setting. You are provided with n training examples:
(x1, y1, h1),(x2, y2, h2), · · · ,(xn, yn, hn), where xi
is the input example, yi
is the class label
(+1 or -1), and hi > 0 is the importance weight of the example. The teacher gave you some
additional information by specifying the importance of each training example. How will you
modify the perceptron algorithm to be able to leverage this extra information? Please justify
your answer.
Q3. (10 points) Consider the following setting. You are provided with n training examples:
(x1, y1),(x2, y2), · · · ,(xn, yn), where xi
is the input example, and yi
is the class label (+1 or
-1). However, the training data is highly imbalanced (say 90% of the examples are negative
and 10% of the examples are positive) and we care more about the accuracy of positive
examples. How will you modify the perceptron algorithm to solve this learning problem?
Please justify your answer.
Q4. (20 points)
You were just hired by MetaMind. MetaMind is expanding rapidly, and you decide to use
your machine learning skills to assist them in their attempts to hire the best. To do so, you
have the following available to you for each candidate i in the pool of candidates I: (i) Their
GPA, (ii) Whether they took Data Mining course and achieved an A, (iii) Whether they took
Algorithms course and achieved an A, (iv) Whether they have a job offer from Google, (v)
Whether they have a job offer from Facebook, (vi) The number of misspelled words on their
resume. You decide to represent each candidate i ∈ I by a corresponding 6-dimensional
feature vector f(x
(i)
). You believe that if you just knew the right weight vector w ∈ <6 you
could reliably predict the quality of a candidate i by computing w · f(x
(i)
). To determine w
your boss lets you sample pairs of candidates from the pool. For a pair of candidates (k, l)
you can have them face off in a “DataMining-fight.” The result is score (k l), which tells
you that candidate k is at least score (k l) better than candidate l. Note that the score
will be negative when l is a better candidate than k. Assume you collected scores for a set
of pairs of candidates P.
Describe how you could use a perceptron based algorithm to learn the weight vector w. Make
sure to describe the basic intuition; how the weight updates will be done; and pseudo-code
for the entire algorithm.
Q5. (30 points) Suppose we have n+ positive training examples and n− negative training
examples. Let C+ be the center of the positive examples and C− be the center of the negative
examples, i.e., C+ =
1
n+
P
i: yi=+1 xi and C− =
1
n−
P
i: yi=−1 xi
. Consider a simple classifier
called CLOSE that classifies a test example x by assigning it to the class whose center is
closest.
• Show that the decision boundary of the CLOSE classifier is a linear hyperplane of the
form sign(w · x + b). Compute the values of w and b in terms of C+ and C−.
• Recall that the weight vector can be written as a linear combination of all the training
examples: w =
Pn++n−
i=1 αi
· yi
· xi
. Compute the dual weights (α’s). How many of the
training examples are support vectors?
Q6. (20 points) Please read the following paper and write a brief summary of the main
points in at most TWO pages.
Pedro M. Domingos: A few useful things to know about machine learning. Communications
of ACM 55(10): 78-87 (2012)
https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf
Sample Solution