Real-Time Vision: YOLO Basics

Object detection as a regression problem. Single-pass inference.

yolo_v8_runtime
1 / 4
Tensor Flow

Status:YOLO treats object detection as a single regression problem. It divides the image into an SxS grid.

Vision Architect

Mastering the YOLO pipeline layers.

You Only Look Once (YOLO)

Unlike older methods that scan an image multiple times at different scales, YOLO sees the whole image at once. It passes the image through a single neural network that outputs both bounding box locations and class labels. This makes YOLO the gold standard for real-time applications like self-driving cars and robotics.

Inference Logic Check

What is the primary advantage of YOLO over Two-Stage detectors like R-CNN?


Vision Glossary

Bounding Box
A rectangular box defined by coordinates (x, y, width, height) that tightly encloses a detected object.
mAP (Mean Average Precision)
The standard metric used to measure the accuracy of object detectors across different categories and IoU thresholds.

Community Holo-Net

Showcase Your Detectors

ACTIVE

Trained a custom YOLO model? Share your bounding boxes and mAP score.

Object Detection & YOLO Architecture

Author

Pascual Vila

AI Instructor // Code Syllabus

Object detection goes beyond simply telling you what's in an image; it tells you exactly where it is by drawing a bounding box.

You Only Look Once (YOLO)

Traditional models like R-CNN use two stages: first finding regions of interest, then classifying them. YOLO processes the entire image in a single neural network pass, making it incredibly fast and capable of real-time detection.

Intersection over Union (IoU)

To know if a predicted bounding box is correct, we calculate the overlap with the true bounding box. IoU divides the area of overlap by the total combined area. An IoU above 0.5 is typically considered a good prediction.

Non-Maximum Suppression (NMS)

YOLO might detect the same object multiple times from different grid cells. NMS cleans this up by keeping the box with the highest confidence and removing any heavily overlapping boxes.

View Deep Dive+

In YOLO, the image is divided into an S x S grid. Each grid cell predicts B bounding boxes and confidence scores for those boxes, as well as C class probabilities. The final output tensor is of size S x S x (B * 5 + C). This unified architecture ensures that YOLO understands the global context of the image, reducing background errors compared to sliding window approaches.