How to Update Dockerized AI Agents Safely

Updated May 2026
Updating a Dockerized AI agent stack involves changing container images, configuration, or both without losing data or causing extended downtime. A disciplined update process uses image versioning, staged rollouts, pre-update backups, and rollback procedures to keep your agent stack running reliably through every change.

AI agent stacks accumulate valuable state over time: conversation histories, learned preferences, vector indices, and model caches. A careless update that drops a database volume, changes an incompatible environment variable, or pulls a broken image version can destroy hours or days of accumulated state and leave your agent offline. These steps establish a repeatable update process that protects your data and provides a clear path back to a working state if anything goes wrong.

Version Your Container Images

Never use the latest tag for production deployments. The latest tag is a moving target that can point to a different image every time you pull it. If you deploy with latest and need to roll back, you have no guarantee that pulling latest again gives you the previous working version. Instead, tag your images with semantic versions (v1.2.3), date-based versions (2026-05-30), or git commit hashes that uniquely identify each build.

For third-party images like PostgreSQL, Ollama, and Redis, pin to specific version tags in your Compose file. Use postgres:16.3 rather than postgres:16 or postgres:latest. Minor version updates within the same major version are generally safe, but you should still control when they happen rather than letting them arrive automatically on the next docker compose pull.

Maintain a changelog or deployment log that records which image versions are running in each environment. When an update introduces a problem, this log tells you exactly which versions were working before the change. Store this log alongside your Compose file in version control so it is always available and auditable.

Build and push your custom agent images to a container registry (Docker Hub, GitHub Container Registry, or a private registry) with unique tags for each version. Local images that exist only on one machine are fragile because they can be accidentally removed by docker system prune or lost if the host fails. A registry provides durable, accessible storage for every version of your image.

Back Up Data Before Updating

Before updating any service, back up every named volume that contains data you cannot recreate. For PostgreSQL, use pg_dump executed inside the container to create a logical backup. For vector databases like Qdrant, use the snapshot API to create a consistent point-in-time backup. For file-based data like agent logs and conversation histories, use docker cp to copy the volume contents to a host directory.

Verify your backups by restoring them to a test environment. A backup that cannot be restored is not a backup. Run a quick verification after each backup: check that the backup file is non-empty, that its size is consistent with expectations, and ideally that a test restore produces valid data. Automated backup verification catches corruption and configuration errors before you need the backup for a real recovery.

Store backups outside the Docker volume system. If a Docker update or configuration change affects your volumes, backups stored inside Docker volumes could be affected too. Copy backup files to a separate directory on the host, an external drive, or cloud storage like S3. Keep at least two previous backup sets so you can recover from a backup that turns out to be corrupted.

Document your backup procedure in a runbook that any team member can follow. Include the exact commands, expected output, verification steps, and restoration procedure for each service. A backup process that only one person knows how to execute is a single point of failure in your operations.

Pull and Test New Images

Run docker compose pull to download updated images specified in your Compose file. This command only downloads images, it does not restart or modify running containers. After pulling, you can inspect the new images with docker images to verify their sizes and creation dates before deploying them.

Test new images in a non-production environment first. Clone your Compose file, point it at a separate set of volumes (or use no volumes for a disposable test), and run the updated stack. Verify that all services start, pass their health checks, and respond correctly to test requests. Check the logs for warnings or errors that were not present in the previous version.

For your custom agent image, run your test suite against the new version before deploying. If you do not have automated tests, at minimum verify the critical path: the agent starts, connects to its model server, processes a test query, and stores the result in the database. Any failure in this critical path is a deployment blocker.

Compare the new image configuration with the previous version. Check for changed default environment variables, modified exposed ports, changed filesystem paths, or removed features. Container images sometimes change defaults between versions, and these changes can break your configuration even when the image itself works correctly.

Deploy Updates with Minimal Downtime

The standard update command is docker compose up -d, which recreates only the containers whose images or configuration have changed. Containers with unchanged images and configuration are not restarted. This selective recreation minimizes downtime because only the updated services experience a brief restart.

For services that can tolerate brief downtime (most AI agents), the standard docker compose up -d approach works well. The old container is stopped and removed, then a new container is created from the updated image. The gap between stop and start is typically 1 to 5 seconds for lightweight services and 10 to 60 seconds for services that need to load models or initialize databases.

Update services in dependency order: databases first, then model servers, then agent runtimes. This ordering ensures that when the agent restarts, its dependencies are already running and healthy. If you update the agent first and the database second, the agent may crash or log errors during the brief window when the database is being updated.

After deploying the update, monitor the stack closely for the first 15 to 30 minutes. Watch container logs for errors, check health check status with docker compose ps, verify resource usage with docker stats, and send test requests to confirm the agent is responding correctly. Most update-related problems manifest within this initial monitoring window.

Roll Back When Updates Fail

If an update causes problems, roll back immediately rather than debugging under pressure. Change the image tags in your Compose file back to the previous working versions and run docker compose up -d. This recreates the affected containers with the old images. Your data remains safe in the named volumes, which are not affected by container recreation.

For database schema changes that are not backward-compatible, rolling back the application image alone may not be sufficient. If the updated agent modified the database schema, the old agent version may not work with the new schema. This is why database migrations should be backward-compatible whenever possible: the old code should still work with the new schema during the transition period.

Keep the previous version images available on your host or in your container registry. Docker compose up -d pulls images if they are not available locally, so having the previous versions cached locally enables faster rollback. Avoid running docker system prune immediately after an update because it removes the old images you might need for rollback.

After every successful update, document what changed and confirm the update in your deployment log. After every failed update, conduct a brief post-mortem: what went wrong, how was it detected, how long did the rollback take, and what can be improved in the update process. These post-mortems prevent the same failure from recurring and continuously improve your update procedures.

Key Takeaway

Version all container images explicitly, back up all persistent data before updates, test new images in a non-production environment, deploy with docker compose up -d for selective container recreation, and maintain a clear rollback procedure that can restore the previous working state within minutes.