Docker Guide: Containers for ML Serving
What is a container
A container is a declaration of exactly what your application needs to run. It packages the code, runtime, libraries, and system tools into a single unit that runs the same everywhere. It is not a virtual machine -- it does not simulate an entire operating system. It shares the host's kernel but isolates everything else. When you say "it works in a container," you are making a guarantee about the environment, not just the code.
What is an image
An image is the blueprint. A container is a running instance of that blueprint. You build an image from a Dockerfile (the instructions), then run containers from that image. Images are layered -- each instruction in the Dockerfile adds a layer. Layers are cached, so rebuilding after a small change only reruns the changed layers and everything after them.
Dockerfile basics
A Dockerfile is a sequence of instructions that build an image layer by layer.
FROM -- the base image. Every Dockerfile starts here. This is the foundation your application builds on.
FROM python:3.11.8-slim
COPY -- copy files from your project into the image. Only copy what the application needs to run.
COPY requirements.txt .
RUN -- execute a command during the build. Typically used for installing dependencies.
RUN pip install --no-cache-dir -r requirements.txt
EXPOSE -- declare which port the container listens on. This is documentation -- you still need -p when running.
EXPOSE 8000
CMD -- the command that runs when the container starts. This is what your container does.
CMD ["uvicorn", "serve:app", "--host", "0.0.0.0", "--port", "8000"]
Build and run cycle
Build the image:
docker build -t finca-serving .
This reads the Dockerfile in the current directory (.), executes each instruction, and tags the result as finca-serving.
Run a container from the image:
docker run -p 8000:8000 finca-serving
This starts a container from the finca-serving image and maps port 8000 on your machine to port 8000 in the container. The -p host:container flag is how you reach the container's services from outside.
Why pinning matters
An unpinned base image changes without warning:
FROM python:3 # Could be 3.11 today, 3.12 tomorrow
FROM python:3.11.8-slim # Always this exact version
When you write FROM python:3, you are saying "give me whatever Python 3 is today." If the base image updates next month with a breaking change, your build breaks even though you changed nothing. Pinning the version makes the build reproducible. The same Dockerfile produces the same image six months from now. The -slim variant excludes development tools you do not need for serving, cutting the image size significantly.
Multi-stage builds
A multi-stage build uses multiple FROM instructions. Each one starts a new stage. You can copy artifacts from an earlier stage into a later one, leaving behind everything you do not need.
Why this matters for ML: the build stage might need compilers, build tools, and the full PyTorch installation to prepare your model. The serving stage only needs the inference runtime, your trained model, and the serving framework. A single-stage build ships everything -- build tools, training code, notebooks, datasets -- into the serving image. A multi-stage build separates what you need to build from what you need to serve.
The pattern: Stage 1 installs everything, prepares artifacts. Stage 2 starts fresh from a slim base and copies only the serving artifacts from Stage 1. The result is a smaller, cleaner serving image with nothing unnecessary.
.dockerignore
A .dockerignore file tells Docker which files to exclude from the build context. Without it, docker build sends your entire project directory to the Docker daemon -- including notebooks, datasets, training logs, and anything else in the directory.
Files to exclude for an ML serving image:
- Jupyter notebooks (*.ipynb)
- Training data (data/, *.csv)
- Training logs and experiment tracking
- Development tools and configs
- Git history (.git/)
- Virtual environments (venv/, .venv/)
- Documentation and markdown files
The build context should contain only what the Dockerfile references: the serving code, the model file, the requirements, and the feature pipeline.