Deep Learning: Batch Normalization

2 min readNov 13, 2023

Batch Normalization (BN) is a technique in deep learning that normalizes the inputs of a neural network layer by adjusting and scaling them. It was introduced to address issues related to internal covariate shift and has become a standard component in many modern neural network architectures. Here’s an in-depth look at Batch Normalization:

1. Internal Covariate Shift:


  • Internal covariate shift refers to the change in the distribution of the input to a neural network layer during training.


  • The shift in input distributions can slow down the training process as each layer has to continuously adapt to the changing inputs.

2. Batch Normalization Concept:


  • Batch Normalization normalizes the inputs of a layer by subtracting the mean and dividing by the standard deviation.

Scale and Shift:

  • The normalized values are then scaled and shifted using learnable parameters to allow the network to adapt during training.

3. Batch Normalization Procedure:

For a Mini-Batch:

  • Given a mini-batch of activations, calculate the mean and standard deviation for each feature.
  • Normalize the features using the mean and standard deviation.
  • Scale and shift the normalized values using learnable parameters (gamma and beta).

4. Benefits of Batch Normalization:

Accelerated Training:

  • Batch Normalization often accelerates training by reducing internal covariate shift.

Stabilizes Training:

  • Helps with the choice of larger learning rates, making training more stable.
  • Reduces Dependency on Weight Initialization:
  • The need for careful weight initialization is reduced.

5. Integration with Neural Networks:

Typical Placement:

  • Batch Normalization is usually applied before the activation function.

In Convolutional Networks: