The Hardware of Edge AI
The cloud assumes unlimited resources. The Edge demands ruthless efficiency. To deploy AI effectively outside of data centers, you must understand the silicon it runs on.
Microcontrollers (MCUs)
MCUs are the heartbeat of embedded systems. They are extremely cheap, draw milliwatts of power, and run without a traditional operating system (bare-metal or RTOS).
The challenge? Memory. A typical MCU like the ARM Cortex-M4 might only have 256KB of SRAM. Running Neural Networks here requires heavy optimization (quantization) and specialized frameworks like TensorFlow Lite for Microcontrollers.
AI Accelerators (NPUs / TPUs)
CPUs process tasks sequentially. GPUs run in parallel but consume massive power. Neural Processing Units (NPUs) or Tensor Processing Units (TPUs) are ASICs—custom silicon designed solely for the matrix math required by Deep Learning.
Hardware like the Google Coral Edge TPU or standard Neural Engines in modern smartphones can execute trillions of operations per second (TOPS) while sipping just 2 to 5 Watts of power.
SRAM vs Flash Memory+
Flash: Read-only memory where your model weights are permanently stored.
SRAM: Extremely fast, volatile memory used as the 'working space' (Tensor Arena) during inference to hold intermediate activations. Model size dictates Flash limits; Network width/batch size dictates SRAM limits.
🤖 Technical FAQs
What is the difference between an MCU and a CPU?
An MPU (Microprocessor Unit / CPU) requires external memory and runs a full OS (Linux/Windows). An MCU (Microcontroller Unit) integrates the processor, memory (Flash/SRAM), and I/O peripherals onto a single chip, running bare-metal code at ultra-low power.
Why is memory the biggest constraint in TinyML?
Standard deep learning models expect Gigabytes of RAM. TinyML targets devices with Kilobytes. When allocating tensors, the intermediate activation layers of a CNN can easily exceed the physical limits of an MCU's SRAM, causing immediate failure.