Max Pooling in CNNs

Vivek Singh
4 min readJul 5, 2022

--

Max Pooling is used in Convolutional Neural Networks which is used to downsample an image to acquire spatial invariance. It is used after a convolutional layer and before the fully connected layer.

Steps in a CNN :

Source : https://www.researchgate.net/figure/The-stages-comprising-a-typical-CNN-model-In-the-first-stage-the-pixel-level-data-from_fig4_333788445

Feature maps :

In convolutional neural networks, you look at an image through a smaller window and move that window to the right and down. That way you can find features in that window, for example a horizontal line or a vertical line or a curve, or edge, or shapes etc. Wherever you find those features, you put that in the feature maps. These combination of features are then used to identify different objects in images during testing or in production.

Why do we need Max Pooling ?

A problem with feature maps output of a convolution layer is that they are dependent on the location of different features in an image i.e. the layer records the precise position of features in the input/image. So when you give this kind of a model to recognize an object in an image where the object is differently positioned, the model will fail to recognize this object.

Cheetah Example :

Your CNN model should be able to identify all the above images as Cheetah even though your image would be tilted, angled different, rotated, horizontally suqashed, or posing differently.

So when we downsample an image using max pooling, we negate this dependency of features on locations so that the model can easily identify different images without any positional hinderance. A lower resolution image is created which still contains important structural elements without the fine details( which may not be relevant like for example positions of features of an object.) This lower resolution of an image also helps in creating a faster model and predicting the results in less time.

This property makes the network capable of detecting the object in the image without being confused by the differences in the image’s textures, the distances from where they are shot, their angles, or otherwise. In order to do that, the network needs to acquire a property that is known as “spatial invariance.”

Spatial Invariance

It allows the CNN to detect features/objects even if it does not look exactly like the images in it’s training period.

Sequence of applying Max Pooling :

  1. Input Image
  2. Convolutional Layer
  3. Nonlinearity
  4. Pooling Layer

How to achieve Max Pooling ?

We select a matrix of filter which is applied to the feature map, so that the resultant output is smaller than the input. For example, a pooling filter of 2*2 (4 pixels) when applied to a feature map of 4*4 (16 pixels) will result in an output pooled feature map of 2*2 (4 pixels).

Max Pooling example :

Source : https://www.geeksforgeeks.org/cnn-introduction-to-pooling-layer/

You see in the above output calculation, we consider the highest value for every block(2*2) of pooling filter size matrix from the feature map matrix. This step helps in removing any unnecessary features. When you take the highest values from that block of matrix while applying max pool, it accounts for only important features in the output.

Stride is a parameter of the neural network’s filter that modifies the amount of movement over the image or video.

Code for max pooling in 2D matrix:

tf.keras.layers.MaxPool2D(
pool_size=(2, 2),
strides=(2,2),
padding='valid',
data_format=None,
**kwargs
)

Check the Tensorflow official docs to understand the parameters in the code.

Two common operations of pooling are :

Max Pooling & Average Pooling. In Max Pooling we calculate the maximum value for each patch of the feature map. In Average pooling, we calculate the average value for each patch in a feature map.

That’s it from my side on Max Pooling. If you are interested, you can go through one of the most famous research paper on Max Pooling.

Please feel free to drop a comment or text anywhere you’d like : LinkedIn or email me at vivek.sinless@gmail.com

--

--

Vivek Singh
Vivek Singh

Written by Vivek Singh

Software Developer. I write about Full Stack, NLP and Blockchain. Buy me a coffee - buymeacoffee.com/viveksinless

No responses yet