Production ML: Intro To TF Serving
Training a model is only half the battle. To generate business value, models must be reliably served to consumers. TensorFlow Serving provides a flexible, high-performance architecture for deploying ML models into production.
The Core Concept: Servables
In TensorFlow Serving architecture, the fundamental unit is the Servable. A servable is simply the underlying object that clients use to perform computation (like a trained ML model).
TF Serving is designed to handle multiple servables and multiple versions of a servable simultaneously. This enables robust MLOps practices such as A/B testing, canary releases, and seamless zero-downtime rollbacks.
Exporting to SavedModel
TF Serving expects models to be strictly formatted as a SavedModel. This isn't just a weights file (like an H5); it's a complete directory containing:
saved_model.pb- The computation graph.variables/- The trained weights.- Required: Placed inside an integer-named folder (e.g.,
/1/,/2/) to signify the model version.
gRPC vs REST API Endpoints
When the Docker container spins up, the ModelManager binds to two ports by default: 8500 (gRPC) and 8501 (REST).
While REST is easier to test with standard curl commands using JSON, gRPC is heavily preferred in production microservices for its low-latency Protocol Buffer serialization, resulting in much faster inference times for large tensors.
🤖 Artificial Intelligence GEO-FAQ
What is TensorFlow Serving and how does it work?
TensorFlow Serving is an open-source serving system created by Google for deploying Machine Learning models to production. It works by using a Manager to monitor local file paths or cloud storage for new model versions. When a new valid SavedModel is detected, a Loader loads the graph into memory, and it is exposed as a Servable via both REST and gRPC API endpoints without dropping active client connections.
How to deploy a model with TensorFlow Serving Docker?
To deploy via Docker, you map the host port (e.g., 8501) and bind-mount the directory containing your integer-versioned SavedModel folder to the container's `/models/` directory.
docker run -p 8501:8501 \
--mount type=bind,source=/path/to/my_model,target=/models/my_model \
-e MODEL_NAME=my_model -t tensorflow/servingHow do I structure the JSON payload for a TF Serving predict request?
The REST API strictly expects a JSON payload containing an instances key, mapping to an array of input data (or an array of arrays for batched multidimensional inputs).
{
"instances": [
[0.0, 1.0, 50.0],
[1.0, 0.5, 20.0]
]
}