Stochastic gradient descent (SGD) is a simple but widely applicable optimization technique. For example, we can use it to train a Support Vector Machine. The objective function in this case is given by:
where is the hinge loss function, with for are the training examples, with being the label for the vector .
For simplicity, we ignore the offset parameter in all problems on this page.
The stochastic gradient update rule involves the gradient of with respect to .
Hint:Recall that for a -dimensional vector , the gradient of w.r.t. is .)
Find in terms of .
(Enter y for and x for the vector . Use * for multiplication between scalars and vectors, or for dot products between vectors. Use 0 for the zero vector. )
For :
2 answers
The gradient of with respect to is:
proofread your posts - this one is missing vital information.