Face Recognition is the automated process of identifying or verifying a person's identity using their facial features. It is one of the most sophisticated applications of vision AI.
1The Biometric Pipeline
Face recognition is the holy grail of biometrics. It feels like magic, but under the hood, it's not a single algorithm. It's a highly structured, multi-stage pipeline. Today, we're going to build that pipeline from scratch.
The pipeline consists of three non-negotiable steps. Step 1: Detection (Where is the face?). Step 2: Alignment (Fix the rotation). Step 3: Recognition (Who is this?). If you fail at Step 1, Step 3 is mathematically impossible.
# The Recognition Pipeline
# 1. Detection (Bounding Box)
# 2. Alignment (Geometric Normalization)
# 3. Recognition (Feature Vector Matching)2Detection (Where is the face?)
Let's start with Detection. Before Deep Learning, engineers used Haar Cascades. This algorithm scans the image looking for simple dark/light contrasts, like 'eyes are darker than the nose bridge'. It's incredibly fast, but struggles if the face is tilted or badly lit.
When the detector finds a face, it returns a 'Bounding Box'. This is an array of four numbers: [x, y, width, height]. The (x,y) is the top-left corner. We use these coordinates to draw a rectangle and physically crop the face out of the larger image.
import cv2
# Loading a pre-trained Haar Cascade
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
# detectMultiScale returns bounding boxes
faces = face_cascade.detectMultiScale(gray_img)
for (x, y, w, h) in faces:
cv2.rectangle(image, (x, y), (x+w, y+h), (0, 255, 0), 2)3Alignment (Fix the rotation)
Modern systems use Deep Learning (like MTCNN or YOLO-Face) for detection. But once detected, we must ALIGN the face.
If a person tilts their head, the recognition algorithm might fail. Alignment algorithms locate the eyes and mathematically rotate the image so the eyes are perfectly level. This geometric normalization is critical for ensuring the features line up consistently during the final recognition step.
# Step 2: Alignment
# 1. Detect Left Eye (x1, y1)
# 2. Detect Right Eye (x2, y2)
# 3. Calculate angle and apply Affine Rotation matrix4Recognition (Who is this?)
Now for Step 3: Recognition. We don't compare pixels directly. Instead, we feed the cropped, aligned face into a Neural Network (like FaceNet). This network compresses the entire face into a 128-dimensional array of numbers.
This is called a 'Face Embedding' or digital fingerprint. To actually recognize someone, we need a database of known embeddings. We take a picture of an employee, generate their 128D embedding, and save it. When someone walks up to the camera, we generate a NEW embedding and compare it to the saved one.
import face_recognition
# The library handles detection and embedding
# It returns a 128-dimensional vector (embedding)
face_embedding = face_recognition.face_encodings(aligned_image)[0]
print(face_embedding.shape) # Output: (128,)5Vector Matching (Verification)
How do we compare these embeddings? We use math: Euclidean Distance. We measure the 'distance' between the two 128-dimensional vectors.
If the distance is very small (usually under 0.6), the system assumes they are the same person. Adjusting this distance threshold determines your system's strictness. A higher threshold is more forgiving of bad lighting, while a lower threshold requires the vectors to be nearly identical to grant access, making it more secure.
import numpy as np
# Calculate Euclidean distance between the arrays
distance = np.linalg.norm(known_database['Alice'] - new_camera_embedding)
# The strict security threshold
if distance < 0.6:
print('Welcome, Alice! Access Granted.')