If we again use the linear perceptron algorithm to train the classifier, what will happen?

Note: In the choices below ,“converge" means given a certain input, the algorithm will terminate with a fixed output within finite steps (assume is very large: the output of the algorithm will not change as we increase ). Otherwise we say the algorithm diverges (even for an extremely large , the output of the algorithm will change as we increase further).

We decide to run the kernel perceptron algorithm over this dataset using the quadratic kernel. The number of mistakes made on each point is displayed in the table below. (These points correspond to those in the plot above.)

Label -1 -1 -1 -1 -1 +1 +1 +1 +1 +1
Coordinates (0,0) (2,0) (1,1) (0,2) (3,3) (4,1) (5,2) (1,4) (4,4) (5,5)
Perceptron mistakes 1 65 11 31 72 30 0 21 4 15

1 answer

It is not clear what will happen if we use the linear perceptron algorithm without more information on the dataset and its linear separability. It may converge if the dataset is linearly separable or it may not converge if it is not.

However, we do know that the kernel perceptron algorithm with quadratic kernel has been run and made some mistakes on each point. This suggests that the dataset is not linearly separable and the quadratic kernel was used to transform the input space to a higher dimensional space where the data is linearly separable.