Computer Vision is the science of extracting meaningful information from digital images and videos. It is the 'eye' of Artificial Intelligence.
1Teaching Machines to See
Welcome to Computer Vision. This field is about teaching machines to 'see' and interpret the visual world. It is the core mathematical engine behind self-driving cars, medical imaging, and face ID.
To a human, an image is a picture full of meaning. To a computer, an image is completely devoid of meaning; it is strictly a mathematical grid of numbers. Every single number represents a pixel's intensity.
# Biological Sight vs Machine Sight
# Human: Understands context, depth, emotion
# Machine: Processes numerical matrices2The Pixel Grid
In a grayscale (black and white) image, that pixel intensity is usually an 8-bit integer. The value 0 represents absolute black, and the value 255 represents absolute white. Everything in between is a shade of gray.
Standard color images use three separate channels: Red, Green, and Blue (RGB). This creates a 3D matrix. A single pixel is no longer one number; it's an array of three numbers [R, G, B] dictating how much of each light to mix.
import numpy as np
# A simple 2x2 grayscale image in NumPy
image_matrix = np.array([
[255, 0], # White pixel, Black pixel
[128, 64] # Gray pixel, Dark Gray pixel
])
print(image_matrix.shape) # Output: (2, 2)3The 3D Tensor Shape
Because color images have a height, a width, and 3 color channels, their geometric 'Shape' in NumPy is critical.
A standard 1080p HD video frame is mathematically represented as a matrix of shape (1080, 1920, 3). The total number of data points the computer has to process is Height * Width * Channels.
# Image Shapes in Memory
# Shape format: (Height, Width, Channels)
hd_image_shape = (1080, 1920, 3)
# Total data points = Height * Width * Channels
total_numbers = 1080 * 1920 * 34The Computer Vision Pipeline
The Computer Vision Pipeline generally follows four rigid steps. Step 1 is Acquisition (camera). Step 2 is Preprocessing (resizing, removing noise). Step 3 is Feature Extraction (finding edges). Step 4 is Inference (making a decision).
Every advanced AI system relies on these foundational steps to parse raw data before pushing it through a neural network to classify what was found.
def standard_cv_pipeline(raw_camera_data):
# Preprocess: Clean the noisy input data
clean_img = preprocess(raw_camera_data)
# Feature Extraction: Find mathematical patterns
features = extract_edges(clean_img)
# Inference: Use AI to classify what was found
return neural_network.predict(features)5OpenCV and the BGR Quirk
To build these pipelines, we use OpenCV, the industry standard C++ library with Python bindings. But beware: OpenCV has a historical quirk. It loads color channels in BGR format (Blue, Green, Red) instead of the modern RGB standard.
Because of this quirk, if you load a red stop sign in OpenCV, the matrix values might indicate it's blue. You must manually convert BGR to RGB if you want to use modern plotting libraries like Matplotlib or render it in a web browser.
import cv2
# OpenCV loads as BGR, not RGB!
img = cv2.imread('input.jpg')
# Converting BGR to standard RGB
rgb_img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)