Creating Dockerfiles for Models

Creating Dockerfiles For ML Models

In MLOps, reproducibility is paramount. A Dockerfile acts as the immutable blueprint for your model's environment, ensuring that the code predicting in production operates exactly identically to how it was trained on your laptop.

The Foundation: Base Images

Every Dockerfile begins with a FROM instruction. For Machine Learning, selecting the correct base image dictates your container's security profile and final footprint. Avoid bloated images like ubuntu:latest. Instead, opt for python:3.9-slim which contains the minimal OS packages needed to run Python and pip.

Layer Caching and Dependency Management

Docker builds images in layers. Each command (RUN, COPY) creates a new layer. If a file copied into a layer hasn't changed, Docker reuses the cached layer, saving enormous amounts of time. Since ML dependencies (like PyTorch, Pandas, Scikit-Learn) are heavy and take minutes to download, we must cache them.

COPY requirements.txt .
RUN pip install -r requirements.txt
COPY src/ .

By copying the requirements file before the source code, changing a single line of Python logic won't invalidate the expensive pip install cache.

❓ Frequently Asked Questions (ML Docker)

Why are Machine Learning Docker images so large?

ML images are notoriously heavy because libraries like TensorFlow and PyTorch include massive pre-compiled binaries and CUDA toolkits for GPU support. To minimize size, use CPU-only wheels if inference doesn't require a GPU, use multi-stage builds, and clear pip cache during the RUN step using --no-cache-dir.

What is the difference between RUN and CMD in a Dockerfile?

RUN executes commands during the image build process (e.g., RUN pip install). CMD defines the default command that runs after the container is launched (e.g., CMD ["uvicorn", "app:main"]).

How do I pass environment variables to my ML model?

You can define default variables using the ENV instruction in the Dockerfile. However, for sensitive credentials like database URIs or API keys, pass them at runtime using docker run -e KEY=VALUE or via a .env file. Do not bake secrets into the Docker image.

Dockerfile Syntax Glossary

FROM

Initializes a new build stage and sets the Base Image for subsequent instructions.

Dockerfile

WORKDIR

Sets the working directory for any RUN, CMD, ENTRYPOINT, COPY and ADD instructions.

Dockerfile

COPY

Copies new files or directories from the host machine and adds them to the container.

Dockerfile

RUN

Executes any commands in a new layer on top of the current image and commits the results.

Dockerfile

EXPOSE

Informs Docker that the container listens on the specified network ports at runtime.

Dockerfile

CMD

Provides defaults for an executing container. Often used to launch a web server for API models.

Dockerfile

Creating Dockerfiles

Deployment Matrix

Base Images

System Verification

Deployment Challenges

Creating Dockerfiles For ML Models

The Foundation: Base Images

Layer Caching and Dependency Management

❓ Frequently Asked Questions (ML Docker)

Dockerfile Syntax Glossary