Creating Dockerfiles For ML Models
In MLOps, reproducibility is paramount. A Dockerfile acts as the immutable blueprint for your model's environment, ensuring that the code predicting in production operates exactly identically to how it was trained on your laptop.
The Foundation: Base Images
Every Dockerfile begins with a FROM instruction. For Machine Learning, selecting the correct base image dictates your container's security profile and final footprint. Avoid bloated images like ubuntu:latest. Instead, opt for python:3.9-slim which contains the minimal OS packages needed to run Python and pip.
Layer Caching and Dependency Management
Docker builds images in layers. Each command (RUN, COPY) creates a new layer. If a file copied into a layer hasn't changed, Docker reuses the cached layer, saving enormous amounts of time. Since ML dependencies (like PyTorch, Pandas, Scikit-Learn) are heavy and take minutes to download, we must cache them.
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY src/ .By copying the requirements file before the source code, changing a single line of Python logic won't invalidate the expensive pip install cache.
❓ Frequently Asked Questions (ML Docker)
Why are Machine Learning Docker images so large?
ML images are notoriously heavy because libraries like TensorFlow and PyTorch include massive pre-compiled binaries and CUDA toolkits for GPU support. To minimize size, use CPU-only wheels if inference doesn't require a GPU, use multi-stage builds, and clear pip cache during the RUN step using --no-cache-dir.
What is the difference between RUN and CMD in a Dockerfile?
RUN executes commands during the image build process (e.g., RUN pip install). CMD defines the default command that runs after the container is launched (e.g., CMD ["uvicorn", "app:main"]).
How do I pass environment variables to my ML model?
You can define default variables using the ENV instruction in the Dockerfile. However, for sensitive credentials like database URIs or API keys, pass them at runtime using docker run -e KEY=VALUE or via a .env file. Do not bake secrets into the Docker image.