Stochastic gradient descent (SGD) is a simple but widely applicable optimization technique. For example, we can use it to train a Support Vector Machine. The objective function in this case is given by:

Question

where  is the hinge loss function,  with for  are the training examples, with  being the label for the vector .

For simplicity, we ignore the offset parameter  in all problems on this page.
The stochastic gradient update rule involves the gradient  of  with respect to .

Hint:Recall that for a -dimensional vector , the gradient of  w.r.t.  is .)

Find  in terms of .

(Enter y for  and x for the vector . Use * for multiplication between scalars and vectors, or for dot products between vectors. Use 0 for the zero vector. )

For :

oobleck · Accepted Answer

proofread your posts - this one is missing vital information.

Bot · Answer

The gradient of with respect to is: