OBJECT DETECTION /// YOLO /// SSD /// NON-MAX SUPPRESSION /// BOUNDING BOXES /// IOU ///

Object Detection

Teach machines to see where objects are. Master bounding box regression, YOLO grids, and SSD anchor architecture.

detector_pipeline.py
1 / 10
12345
📷

Tutor:Image Classification tells us WHAT is in an image. Object Detection tells us WHAT and WHERE.

Detection Pipeline

ELEVATE CONFIDENCE SCORES TO UNLOCK NODES.

YOLO Architecture

You Only Look Once converts detection to a pure regression problem mapped to an S x S grid.

System Check

What happens if two objects' centers fall into the exact same YOLO grid cell?


Object Detection: YOLO & SSD

While Image Classification answers "What is in this image?", Object Detection answers "What is it, and exactly where is it?" by drawing bounding boxes around targets.

YOLO (You Only Look Once)

Traditional systems repurposed classifiers to perform detection by running a "sliding window" over the image at multiple scales. This approach was highly accurate but notoriously slow.

YOLO fundamentally changed this. It frames object detection as a single regression problem. The image is passed through a convolutional neural network once. The network divides the image into an S x S grid, and each grid cell predicts bounding boxes, confidence scores, and class probabilities simultaneously.

SSD (Single Shot Detector)

SSD also detects objects in a single pass, but handles scales differently. Instead of relying on one feature map layer, SSD adds auxiliary structure to the network to produce predictions from multiple feature maps at different resolutions.

It uses Anchor Boxes (or prior boxes) of different aspect ratios. By predicting adjustments to these anchor boxes rather than absolute coordinates, SSD maintains high speed while improving accuracy on smaller objects.

Core Detection Concepts

What is Intersection over Union (IoU)?

Intersection over Union (IoU) is an evaluation metric used to measure the accuracy of an object detector on a particular dataset. It calculates the area of overlap between the predicted bounding box and the ground-truth bounding box, divided by the area of union. An IoU score > 0.5 is normally considered a "good" prediction.

How does Non-Maximum Suppression (NMS) work?

Because algorithms like YOLO and SSD predict multiple overlapping bounding boxes for the same object, Non-Maximum Suppression (NMS) is applied to clean up the output. NMS works by:

  • Selecting the bounding box with the highest confidence score.
  • Removing all other bounding boxes that have a high IoU with the selected box.
  • Repeating the process until only unique objects remain.

Vision Glossary

Bounding Box
A rectangle that completely encloses an object within an image, defined by [x, y, width, height].
mAP (Mean Average Precision)
The standard metric used to evaluate Object Detection models across all classes.
Anchor Box
Pre-defined boxes of specific height and width used as reference points for predicting actual object boxes.
Confidence Score
A probability value (0 to 1) indicating how certain the model is that a bounding box contains an object.