5. (1)
{(f1,f2):f1>0,f2>0}:
I --> True
f1-axis:
I --> True
II --> True
f2-axis:
I --> True
IV --> True
the origin (f1,f2)=(0,0):
None of the above --> True
5. (2)
If we keep the hidden layer parameters above fixed but add and train additional hidden layers (applied after this layer) to further transform the data, could the resulting neural network solve this classification problem?
Yes
Suppose we stick to the 2-layer architecture but add many more ReLU hidden units, all of them without offset parameters. Would it be possible to train such a model to perfectly separate these points?
yes
5. (3)
The gradient calculated in the backpropagation algorithm consists of the partial derivatives of the loss function with respect to each network weight.
True
Initialization of the parameters is often important when training large feed-forward neural networks.
True
If weights in a neural network with sigmoid units are initialized to close to zero values, then during early stochastic gradient descent steps, the network represents a nearly linear function of the inputs.
True
On the other hand, if we randomly set all the weights to very large values, or don't scale them properly with the number of units in the layer below, then the sigmoid units would behave like sign units.
True
If we use only sign units in a feedforward neural network, then the stochastic gradient descent update will
change the weights by large amounts at random
Stochastic gradient descent differs from (true) gradient descent by updating only one network weight during each gradient descent step.
False
5. (4)
Since we apply the same convolutional filter throughout the image, we can learn to recognize the same feature wherever it appears.
True
A fully connected layer for a reasonably sized image would simply have too many parameters
True
A fully connected layer can learn to recognize features anywhere in the image even if the features appeared preferentially in one location during training
True