The correct answer is A. To scale the output of each layer.
Batch Normalization is primarily used in Convolutional Neural Networks (CNNs) to stabilize and accelerate the training process. Here are the key points:
-
To address internal covariate shift: Although the statement B refers to internal covariate shift, Batch Normalization actually helps to mitigate it rather than increase it. Internal covariate shift refers to the phenomenon where the distribution of inputs to a layer changes during training as the parameters of the previous layers change. Batch Normalization helps to standardize the inputs to each layer, which can reduce the problems associated with this shift.
-
To scale the output of each layer: This is the primary purpose of Batch Normalization. It normalizes the layer inputs by scaling and shifting them. It rescales the outputs to ensure they have a mean of zero and a standard deviation of one, followed by a learnable scale and shift. This helps stabilize the learning process and allows for faster training with potentially higher learning rates.
So, while Batch Normalization does indeed help to avoid issues associated with internal covariate shift (related to choice B), it specifically does so by scaling (and shifting) the outputs of each layer (related to choice A). Hence, option A is the best choice.
Option C is incorrect, as it suggests that Batch Normalization avoids scaling, when in fact, scaling is a fundamental part of what it does.