Standard corner detectors fail when objects change size or rotate. SIFT and SURF provide mathematical 'fingerprints' that are invariant to scaling, rotation, and lighting changes.
1Scale-Invariant Features
Welcome to the heavyweights of computer vision. We have seen that basic corner detectors fail completely when an object is zoomed in or rotated. A corner at a small scale becomes a flat edge when magnified.
In this module, we will explore SIFT and SURF—revolutionary algorithms that find mathematical 'fingerprints' which remain consistent regardless of how large, small, or tilted the object appears in the image. Let's conquer scale invariance.
# SIFT Initialization
import cv2
img = cv2.imread('book.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
# Initialize SIFT
sift = cv2.SIFT_create()
keypoints, descriptors = sift.detectAndCompute(gray, None)2Understanding Keypoints
What exactly is a 'keypoint' in SIFT? A keypoint object contains several critical pieces of data: its exact (X, Y) coordinates, its size (the scale at which it was found), and its angle (the dominant direction of the gradients around it).
This orientation data is what makes SIFT rotation-invariant. If the image rotates, the keypoint's angle rotates with it, ensuring that our mathematical representation remains perfectly consistent. It's a localized, highly specific anchor in the image.
# Extracting keypoint data
first_kp = keypoints[0]
print(f'Location: {first_kp.pt}')
print(f'Size: {first_kp.size}')
print(f'Angle: {first_kp.angle}')3The 128-Dimensional Descriptor
While the keypoint tells us 'where' the feature is, the 'descriptor' tells us 'what' it looks like. For every single SIFT keypoint, the algorithm generates a 128-dimensional vector of numbers. This is its mathematical fingerprint.
It analyzes a 16x16 pixel neighborhood around the keypoint, divides it into sub-blocks, and calculates gradient histograms. This complex 128-number array is extremely robust against changes in illumination and slight shifts in perspective.
# Descriptors are vectors of numbers
print(f'Detected {len(keypoints)} keypoints')
print(f'Descriptor shape: {descriptors.shape}')
# Example output: (500, 128)
# 500 keypoints, each with 128 values4Introducing SURF for Speed
SIFT is highly accurate but computationally expensive. To solve this, researchers developed SURF (Speeded-Up Robust Features). Instead of the slow Difference of Gaussians used by SIFT, SURF uses the 'Hessian Matrix' and 'Box Filters', accelerating the math using Integral Images.
SURF is designed to be fast enough for real-time video applications like augmented reality or robotics. You can initialize it using SURF_create() and pass a Hessian Threshold to control how many features you want.
# Note: SURF is often in opencv-contrib
# 400 is the Hessian Threshold
surf = cv2.xfeatures2d.SURF_create(400)
kp, des = surf.detectAndCompute(img, None)5Feature Matching and FLANN
Once we have descriptors from two different images, we need to match them. Brute force checking every point is too slow. To solve this, we use FLANN (Fast Library for Approximate Nearest Neighbors).
FLANN builds optimized internal tree structures to search high-dimensional spaces incredibly fast. Combined with David Lowe's Ratio Test (which throws away ambiguous matches), FLANN is the industry standard for high-speed, high-accuracy feature correspondence.
# FLANN parameters
index_params = dict(algorithm = 1, trees = 5)
search_params = dict(checks=50)
flann = cv2.FlannBasedMatcher(index_params, search_params)
matches = flann.knnMatch(des1, des2, k=2)