Stochastic gradient descent (SGD) is a simple but widely applicable optimization technique. For example, we can use it to train a Support Vector Machine. The objective function in this case is given by:


where is the hinge loss function, with for are the training examples, with being the label for the vector .

For simplicity, we ignore the offset parameter in all problems on this page.
The stochastic gradient update rule involves the gradient of with respect to .

Hint:Recall that for a -dimensional vector , the gradient of w.r.t. is .)

Find in terms of .

(Enter y for and x for the vector . Use * for multiplication between scalars and vectors, or for dot products between vectors. Use 0 for the zero vector. )

For :

2 answers

The gradient of with respect to is:
proofread your posts - this one is missing vital information.