Object Tracking in 3D: Perceiving the World in Depth
Detection tells us "What" is in a single frame. Tracking tells us "Where" it is going across time. For autonomous systems, understanding object trajectories is the difference between safe navigation and catastrophic collision.
3D Bounding Boxes
Unlike 2D images where objects are enclosed in flat rectangles (x, y, w, h), autonomous vehicles rely heavily on LiDAR and stereo-vision to understand depth. A 3D bounding box encompasses an object in physical space and requires 7 parameters:
- Position (x, y, z): The center point of the object relative to the sensor.
- Dimensions (l, w, h): Length, width, and height of the box.
- Orientation (yaw): The rotation around the vertical axis (z-axis), denoting which direction the vehicle/object is facing.
Intersection over Union (IoU)
How do we determine if a car detected in Frame 1 is the same car detected in Frame 2? The most common metric is IoU. In 3D, we calculate the overlapping volume of the two bounding boxes and divide it by their total combined volume.
If the IoU score is close to 1.0, the boxes overlap perfectly. If it's 0.0, they don't touch at all. Setting an appropriate min_iou_threshold is key to filtering out false associations.
Data Association: SORT
SORT (Simple Online and Realtime Tracking) is a pragmatic approach to multiple object tracking. It marries two powerful algorithms:
- Kalman Filters: Used to predict the future state (velocity, position) of a tracked object based on its past states.
- Hungarian Algorithm: An optimization algorithm that matches the predicted tracks to the new detections by finding the lowest overall cost (maximizing IoU).
📡 Tracking Diagnostics (FAQ)
Why is tracking necessary if detection works perfectly?
Detection is frame-independent. An object detection algorithm does not know if a car in Frame 10 is the same car from Frame 9. Tracking gives the object a persistent ID across time, allowing the system to calculate critical parameters like velocity, acceleration, and trajectory prediction.
What happens if a tracked object is occluded (hidden) temporarily?
If an object goes behind a building, detection will fail. However, tracking systems use a Kalman Filter to predict where the object *should* be based on its last known velocity. It keeps the "Track ID" alive in memory (determined by the max_age parameter) so that when it reappears, it retains the same ID instead of being classified as a brand new object.
How does the Hungarian Algorithm prevent ID switching?
In a crowded scene, multiple detections might overlap with multiple predicted tracks. A greedy algorithm might just assign the first overlap it finds, causing IDs to swap between nearby pedestrians. The Hungarian Algorithm builds a cost matrix of all tracks vs. all detections and finds the global optimum assignment, minimizing the total distance error across all matches.
