How to Write a Dockerfile for AI Agents
Your Dockerfile transforms a bare operating system image into a complete runtime environment for your AI agent. Every instruction in the file creates a layer in the final image, and understanding how layers work is key to building efficient images that deploy quickly and reliably. These steps walk through the entire process from base image selection to production-ready optimization, with specific attention to the requirements that make AI agent containers different from typical web application containers.
Choose the Right Base Image
The base image determines your container operating system, pre-installed tools, and starting size. For Python-based AI agents, python:3.11-slim is the standard choice. It provides a Debian-based environment with Python pre-installed while omitting development tools you do not need at runtime, keeping the base size around 120 MB.
If your agent needs CUDA for local GPU inference, start with nvidia/cuda:12.4.1-runtime-ubuntu22.04 and install Python on top. The CUDA runtime image includes only the libraries needed to run GPU workloads, not the full CUDA development toolkit. This keeps the image smaller than the devel variant while still supporting all inference operations.
Avoid using the latest tag for base images. Pin to a specific version like python:3.11.9-slim-bookworm so that builds are reproducible. An unpinned latest tag can pull a different base image tomorrow, potentially breaking your build or introducing unexpected behavior changes. For production agents that handle important workloads, reproducibility is not optional.
Alpine-based images (python:3.11-alpine) are tempting because of their small size, but they use musl libc instead of glibc. Many Python scientific and ML packages distribute pre-compiled wheels for glibc only, which means Alpine builds frequently require compiling packages from source. This extends build times dramatically and sometimes fails entirely for packages with complex native dependencies like numpy, scipy, or torch.
Install System Dependencies and Python Packages
AI agent containers often need system libraries that are not included in slim base images. Common requirements include libpq-dev for PostgreSQL client libraries, build-essential for compiling Python packages with C extensions, libffi-dev for packages that use the foreign function interface, and ca-certificates for HTTPS connections to model APIs.
Install system packages in a single RUN instruction to create one layer. Combine the install command with a cache cleanup command to prevent the apt cache from inflating your image size. The pattern is RUN apt-get update, then apt-get install -y with your package list, then rm -rf /var/lib/apt/lists/*, all joined with double-ampersand operators in one RUN statement.
For Python packages, copy your requirements.txt file into the container and run pip install before copying your agent source code. This is the single most important optimization for build speed. Docker caches each layer, and since your requirements change less frequently than your code, putting pip install in an earlier layer means Docker can reuse the cached dependency layer for most builds. Only when requirements.txt actually changes does Docker re-run the pip install step.
If your agent uses large ML packages like torch, transformers, or sentence-transformers, consider splitting your requirements into two files: one for heavy ML dependencies and one for lightweight application dependencies. Install the heavy dependencies first so their cache layer persists even when you add or update small application packages.
Copy Your Agent Code and Set the Working Directory
Use the WORKDIR instruction to set a consistent working directory inside the container. The conventional path is /app, though some teams prefer /opt/agent or /home/agent depending on their deployment conventions. WORKDIR creates the directory if it does not exist and sets it as the default location for subsequent COPY, RUN, and CMD instructions.
Copy only the files your agent actually needs at runtime. Use a .dockerignore file in your project root to exclude development artifacts, test suites, documentation, git history, and local environment files. A well-maintained .dockerignore prevents accidentally including large files or sensitive credentials in your container image.
Structure your COPY instructions from least-frequently-changed to most-frequently-changed. Configuration files and dependency manifests go first, then utility modules, then your main agent code. This ordering maximizes cache hit rates because Docker invalidates a layer and all subsequent layers when any file in a COPY instruction changes.
If your agent loads models, prompts, or configuration from files rather than environment variables, copy those files in a separate COPY instruction from your Python code. This way, updating a prompt template does not invalidate the layer containing your main codebase, and vice versa.
Configure Environment Variables and the Entrypoint
Use ENV instructions to set default values for configuration that your agent reads at runtime. Common environment variables for AI agents include MODEL_ENDPOINT for the model server URL, DATABASE_URL for the persistence layer connection string, LOG_LEVEL for controlling output verbosity, and AGENT_MODE for toggling between development and production behaviors.
Do not hardcode sensitive values like API keys, database passwords, or model access tokens in ENV instructions. These values are baked into the image and visible to anyone who can pull it. Instead, set them at container start time through your Compose file environment section, an env_file reference, or a secrets manager integration.
Define your container startup command with ENTRYPOINT rather than CMD when your container always runs the same program. ENTRYPOINT sets the executable, and CMD provides default arguments that users can override. For most AI agents, the pattern is ENTRYPOINT with python followed by your main script path. This makes docker run your-image immediately start the agent without requiring users to specify the command.
Set PYTHONUNBUFFERED to 1 so that Python output appears immediately in container logs rather than being buffered. Without this, print statements and logging calls may be delayed or lost entirely if the container crashes before the buffer flushes. This is particularly important for AI agents where you want to see model interactions and decision logs in real time.
Add Health Checks and Optimize the Image
Include a HEALTHCHECK instruction so Docker and orchestrators can monitor whether your agent is actually functioning, not just running. A simple health check for an HTTP-based agent tests its API endpoint with a curl command against localhost on the application port. For agents without HTTP endpoints, a health check script that verifies the agent process is responsive and connected to its dependencies works equally well.
Set appropriate health check intervals and timeouts. For AI agents that may be busy with long inference calls, set the interval to 30 seconds and the timeout to 10 seconds with a start period of 60 seconds. The start period gives your agent time to initialize, load models, and warm up before Docker starts checking its health.
Use multi-stage builds to reduce your final image size. In the first stage (the builder), install build tools and compile any packages that need them. In the second stage (the runtime), copy only the compiled packages and your application code. This removes compilers, header files, and build caches from the final image, which can reduce image size by 30 to 50 percent for agents with compiled dependencies.
Run your agent as a non-root user for security. Add a RUN instruction to create a dedicated user and group, then use the USER instruction to switch to that user before the ENTRYPOINT. Running as root inside a container is unnecessary for most AI agents and creates risk if a vulnerability allows container escape. Creating a dedicated agent user with minimal permissions is a straightforward hardening step.
After building your image, verify it works by running docker build with a descriptive tag, then docker run with the necessary environment variables and port mappings. Check docker images to see the final image size and docker history to review layer sizes. If any layer is unexpectedly large, investigate what files were added and whether they can be excluded or moved to an earlier, more cacheable layer.
Start with a slim Python base image, install dependencies before copying code to maximize layer caching, use multi-stage builds for compiled dependencies, and always include a HEALTHCHECK instruction for production readiness.