How Computers Process Images
"A picture is worth a thousand words, but to a computer, it's just a million numbers."
Spatial Resolution
An image is fundamentally a discrete 2D grid. The building blocks of this grid are called pixels (picture elements). The number of pixels spanning the width and height of the image defines its resolution. A 1920x1080 image has 1,920 columns and 1,080 rows, totaling over 2 million pixels!
Intensity and Bit Depth
In a grayscale image, each pixel is assigned a single number representing its brightness. Typically, computers use 8-bit unsigned integers (uint8) for this. This means the values range from 0 (pure black) to 255 (pure white).
Color Spaces (RGB)
To represent color, we stack multiple grayscale grids on top of each other. These are called channels. The standard is RGB (Red, Green, Blue). Therefore, a color pixel is an array of 3 numbers. Bright yellow, for instance, is highly intense in Red and Green, but zero in Blue [255, 255, 0].
NumPy Coordinate Warning+
Y before X! Unlike standard Cartesian coordinates (x, y), image matrices are accessed via row first, then column: image[y, x]. This maps to image[height, width]. Always remember this when slicing arrays in Python!
❓ Frequently Asked Questions
What is a pixel in computer vision?
A pixel is the smallest controllable element of a picture represented on a screen. In computer vision memory, a pixel is stored as a numerical value (or set of values) that represents light intensity and color. For standard images, this is an integer from 0 to 255.
Why do we use NumPy for images?
NumPy provides highly optimized, C-based multidimensional arrays. Since digital images are essentially 2D or 3D matrices of numbers, NumPy allows computer vision algorithms to perform complex mathematical operations across millions of pixels simultaneously without slow Python loops.
What is the difference between RGB and Grayscale arrays?
A grayscale image is represented by a 2D array of shape (Height, Width) where each coordinate holds one value. An RGB image is a 3D array of shape (Height, Width, 3), where the third dimension holds the distinct Red, Green, and Blue intensity values.