2. (1)

1 point possible (graded, results hidden)
If we again use the linear perceptron algorithm to train the classifier, what will happen?

Note: In the choices below ,“converge" means given a certain input, the algorithm will terminate with a fixed output within finite steps (assume is very large: the output of the algorithm will not change as we increase ). Otherwise we say the algorithm diverges (even for an extremely large , the output of the algorithm will change as we increase further).

The algorithm always converges and we get a classifier that perfectly classifies the training dataset.

The algorithm always converges and we get a classifier that does not perfectly classifies the training dataset.

The algorithm will never converge.

The algorithm might converge for some initial input of and certain sequence of the data, but will diverge otherwise. When it converges, we always get a classifier that does not perfectly classifies the training dataset.

The algorithm might converge for some initial input of and certain sequence of the data, but will diverge otherwise. When it converges, we always get a classifier that perfectly classifies the training dataset.

2. (2)
2 points possible (graded, results hidden)
We decide to run the kernel perceptron algorithm over this dataset using the quadratic kernel. The number of mistakes made on each point is displayed in the table below. (These points correspond to those in the plot above.)

Label -1 -1 -1 -1 -1 +1 +1 +1 +1 +1
Coordinates (0,0) (2,0) (1,1) (0,2) (3,3) (4,1) (5,2) (1,4) (4,4) (5,5)
Perceptron mistakes 1 65 11 31 72 30 0 21 4 15
Define the feature map of our quadratic kernel to be:


Assume all parameters are set to zero before running the algorithm.

Based on the table, what is the output of and ?

(Enter accurate to at least 2 decimal places.)
theta_0=
theta=


(Enter as a vector, enclosed in square brackets, and components separated by commas, e.g. type [0,1] for . Note that this sample vector input may not be of the same dimension of the answer. Enter each component accurate to at least 2 decimal places.)

2. (3)
1 point possible (graded, results hidden)
Based on the calculation of and , does the decision boundary correctly classify all the points in the training dataset?

2. (4)
1 point possible (graded, results hidden)
Recall for x=[x_1,x_2]^T

Define the kernel function
phi(x)=[x_1^2,sqrt(2)*x_1*x_2,x_2^2]

Write as a function of the dot product . To answer, let , and enter in terms of .
K(x,x^')

2 answers

2. (1)
The algorithm will never converge.

2. (2)
theta_0= 5.00
theta= [-24.00, -12.00, -16.00]

2. (3)
Yes, the decision boundary correctly classifies all the points in the training dataset.

2. (4)
K(x,x^') = x^T x^' = x_1^2 * x_1'^2 + 2 * x_1 * x_2 * x_1' * x_2' + x_2^2 * x_2'^2
1) Will never converge