CNN ARCHITECTURE /// FEATURE EXTRACTION /// MAX POOLING /// STRIDE & PADDING /// KERNELS /// CNN ARCHITECTURE ///

Feature Extraction

Unlock Computer Vision. Learn how Convolutional Neural Networks (CNNs) use kernels to 'see' shapes and pooling to optimize learning.

model.py
1 / 7
πŸ‘οΈπŸ€–

Tutor:How do computers 'see'? Unlike humans, they see images as 2D matrices of numbers (pixels). To extract patterns, we use Convolutional Neural Networks.

Architecture Map

UNLOCK LAYERS TO BUILD YOUR CNN.

Convolutions

Filters slide across an input to extract spatial features like edges and textures.

System Evaluation

What is the result of a convolution operation over an image?


Feature Extraction: Convolutions & Pooling

In traditional neural networks, flattening an image discards critical spatial information. Convolutional Neural Networks (CNNs) preserve this structure by applying localized filters, enabling the model to "see" edges, textures, and shapes.

What is a Convolution?

A convolution is a mathematical operation applied to images. Imagine a small grid of numbers, called a Kernel or Filter (often 3x3). This kernel slides across the original image pixel by pixel. At each step, we multiply the kernel values by the underlying pixel values and sum them up.

This process creates a new 2D matrix called a Feature Map. Different kernels can be trained to detect different featuresβ€”one might find horizontal edges, while another detects colors or corners.

Visualizing the sliding window of a kernel over an input matrix.

Dimensionality Reduction via Pooling

After multiple convolutions, the network generates a massive amount of data. Pooling layers solve this by downsampling the feature maps, reducing the computational load and preventing overfitting.

  • Max Pooling: The most common technique. It takes a small window (e.g., 2x2) and only keeps the maximum value. This retains the most prominent features (like a sharp edge) while discarding the rest.
  • Average Pooling: Computes the average of the values in the window. It's less common today but used in specific architectures like Global Average Pooling (GAP) before the final output layer.

Max Pooling shrinking a 4x4 grid into a 2x2 grid by extracting maximum values.

❓ Deep Learning FAQ: Convolutions

What is Stride in a Convolutional Layer?

Stride refers to how many pixels the filter moves at a time. A stride of 1 means the filter moves 1 pixel per step. A stride of 2 skips a pixel, effectively reducing the spatial dimensions of the output feature map by half.

What is Padding (Same vs. Valid)?

When a filter slides over an image, the edges are processed less than the center, and the output shrinks. Padding (usually zero-padding) adds a border of zeros around the image. `padding='same'` keeps the output dimensions the same as the input. `padding='valid'` means no padding, so the output shrinks.

Why do we use ReLU after a Convolution?

A convolution is a linear operation (just multiplications and additions). To learn complex, real-world patterns, we must introduce non-linearity. The ReLU (Rectified Linear Unit) activation function is standard because it effectively removes negative values, preventing the vanishing gradient problem and speeding up training.

Architecture Glossary

Kernel / Filter
A small matrix of weights that slides over the input data to extract features like edges or textures.
keras_api.py
Stride
The number of pixels by which the kernel moves across the input matrix. Higher stride reduces output dimensions.
keras_api.py
Padding
Adding zeros around the border of the input matrix to preserve spatial dimensions after convolution.
keras_api.py
Max Pooling
A downsampling operation that selects the maximum element from the region of the feature map covered by the filter.
keras_api.py