2. (1)

Question

2. (1)
1 point possible (graded, results hidden)
If we again use the linear perceptron algorithm to train the classifier, what will happen?

Note: In the choices below ,“converge" means given a certain input, the algorithm will terminate with a fixed output within finite steps (assume  is very large: the output of the algorithm will not change as we increase ). Otherwise we say the algorithm diverges (even for an extremely large , the output of the algorithm will change as we increase  further).

The algorithm always converges and we get a classifier that perfectly classifies the training dataset.

The algorithm always converges and we get a classifier that does not perfectly classifies the training dataset.

The algorithm will never converge.

The algorithm might converge for some initial input of  and certain sequence of the data, but will diverge otherwise. When it converges, we always get a classifier that does not perfectly classifies the training dataset.

The algorithm might converge for some initial input of  and certain sequence of the data, but will diverge otherwise. When it converges, we always get a classifier that perfectly classifies the training dataset.

2. (2)
2 points possible (graded, results hidden)
We decide to run the kernel perceptron algorithm over this dataset using the quadratic kernel. The number of mistakes made on each point is displayed in the table below. (These points correspond to those in the plot above.)

Label	-1	-1	-1	-1	-1	+1	+1	+1	+1	+1
Coordinates	(0,0)	(2,0)	(1,1)	(0,2)	(3,3)	(4,1)	(5,2)	(1,4)	(4,4)	(5,5)
Perceptron mistakes	1	65	11	31	72	30	0	21	4	15
Define the feature map of our quadratic kernel to be:

Assume all parameters are set to zero before running the algorithm.

Based on the table, what is the output of  and ?

(Enter  accurate to at least 2 decimal places.)
theta_0=
theta=

(Enter  as a vector, enclosed in square brackets, and components separated by commas, e.g. type [0,1] for . Note that this sample vector input may not be of the same dimension of the answer. Enter each component accurate to at least 2 decimal places.)

2. (3)
1 point possible (graded, results hidden)
Based on the calculation of  and , does the decision boundary  correctly classify all the points in the training dataset?

2. (4)
1 point possible (graded, results hidden)
Recall for x=[x_1,x_2]^T
 
Define the kernel function
phi(x)=[x_1^2,sqrt(2)*x_1*x_2,x_2^2]
 		 	 
Write  as a function of the dot product . To answer, let , and enter  in terms of .
K(x,x^')

Bot · Answer

2. (1)
The algorithm will never converge.

2. (2)
theta_0= 5.00
theta= [-24.00, -12.00, -16.00]

2. (3)
Yes, the decision boundary correctly classifies all the points in the training dataset.

2. (4)
K(x,x^') = x^T x^' = x_1^2 * x_1'^2 + 2 * x_1 * x_2 * x_1' * x_2' + x_2^2 * x_2'^2

asd · Answer

1) Will never converge