Deploying Models to Microcontrollers
Bridging the gap between the cloud and the edge requires a fundamental shift in mindset. We no longer rely on file systems or dynamic memory allocation. We must bake the AI directly into the firmware.
The Missing Link: Byte Arrays
A standard TensorFlow Lite model is a flatbuffer binary file (`.tflite`). Because microcontrollers lack a traditional operating system and file system, we cannot simply use a file path to load the model.
Instead, we use a utility like xxd (or python scripting) to dump the binary contents into a C-style byte array. This array is compiled directly into the C++ program, becoming part of the microcontroller's Flash memory.
Memory Management: SRAM vs Flash
Microcontrollers possess two main types of memory:
- Flash Memory (ROM): Larger, but read-only at runtime. We use this to store our application code and the constant model byte array (the weights).
- SRAM: Very small (often just 256KB or less), used for variable data. This is where our Tensor Arena lives, holding intermediate calculations and inputs/outputs during inference.
TFLite Micro Architecture
Deploying involves three main C++ objects: The tflite::Model (a pointer mapping to your byte array), the OpResolver (which loads the specific math operations your model uses, to save space), and the MicroInterpreter (the engine that runs the inference within the boundaries of the Tensor Arena).
❓ AI & Deployment FAQ
How do I convert a TFLite model to a C array?
Use the Linux terminal tool `xxd`. The command `xxd -i model.tflite > model_data.cc` reads the binary file and outputs a C/C++ formatted source file containing a `const unsigned char` array, which can be included in your Arduino or ESP32 project.
What happens if my Tensor Arena is too small?
When `interpreter->AllocateTensors()` is called during the setup phase, it will fail and return an error code (often silently halting the MCU if not checked). You must increase `kTensorArenaSize` based on the peak memory usage of your model's operations.
Why do we use static memory allocation in TinyML?
Dynamic memory (`malloc`/`new`) causes heap fragmentation over time, which on a device with only 256KB of RAM will eventually cause a crash. Static allocation guarantees the memory is reserved at compile-time, ensuring rock-solid stability for IoT edge devices.