Docker Health Checks for AI Services
Every service in your AI agent stack has a different definition of "healthy." A database is healthy when it can accept connections and execute queries. A model server is healthy when it has loaded a model and can serve inference requests. A vector database is healthy when its indices are loaded and search operations work. Your agent runtime is healthy when it can reach its dependencies and process requests. These steps show how to implement the right health check for each service type and wire them together into a robust startup and monitoring system.
Write Health Check Commands for Databases
PostgreSQL provides the pg_isready utility specifically for health checking. This tool attempts to connect to the PostgreSQL server and returns a zero exit code if the server is accepting connections. Configure it as your health check test command with the appropriate host, port, and user parameters. Since the health check runs inside the container, use localhost as the host and the container internal port (typically 5432).
Redis health checks use redis-cli ping, which sends a PING command and expects a PONG response. This verifies both that the Redis process is running and that it can accept and respond to commands. For Redis clusters or sentinel configurations, the health check should also verify that the node is in the correct role (primary or replica) using redis-cli role.
Set the health check interval for databases to 10 to 15 seconds. Databases have fast startup times (typically 2 to 5 seconds for PostgreSQL, under 1 second for Redis) so the start_period can be short, around 10 to 15 seconds. Set retries to 3 so Docker requires three consecutive failures before marking the container unhealthy, preventing a single slow query from triggering a false alarm.
For databases that need initial schema setup (running migrations, creating tables, loading seed data), the health check should verify both connection readiness and schema readiness. A custom health check script can first run pg_isready, then execute a simple query against a table that your agent requires. If the table does not exist yet, the check fails, preventing the agent from starting before migrations have run.
Configure Health Checks for Model Servers
Ollama provides an HTTP API on port 11434 that responds to requests at the root path. A basic health check uses curl to verify the API is responding. However, a more thorough check verifies that a specific model is loaded and ready for inference. Use the Ollama API tags endpoint to check that your required model appears in the loaded models list.
vLLM health checks should target the /health endpoint that vLLM provides. This endpoint returns a 200 status when the model is loaded and the server is ready to accept inference requests. During model loading (which can take 30 seconds to several minutes for large models), the health endpoint returns a non-200 status, correctly indicating that the service is not yet ready.
Model servers need longer start_period values than other services because loading a model into GPU memory takes significant time. A 7B model typically loads in 5 to 15 seconds, but a 70B model can take 60 to 180 seconds depending on the storage speed and GPU memory bandwidth. Set start_period to at least twice the expected model load time to account for slower loads during high system activity.
Health check timeout values for model servers should be generous. If the model server is processing a long inference request when the health check runs, it may not respond to the health check within a tight timeout. Set the timeout to 10 to 15 seconds for model servers, compared to the 3 to 5 second timeout that is appropriate for databases. The interval should also be longer (30 seconds) to avoid consuming inference capacity with frequent health check requests.
Add Health Checks for Vector Databases
Qdrant exposes a health endpoint at /healthz that returns a 200 status when the service is ready. Configure the health check to curl this endpoint. Qdrant also provides a readiness endpoint that verifies indices are loaded and the service can serve queries, which is a more thorough check than simple process liveness.
ChromaDB health checks should target the /api/v1/heartbeat endpoint, which returns a timestamp when the service is responsive. For ChromaDB configurations with persistent storage, verify that the persistence directory is mounted and accessible by checking that the heartbeat response includes a valid timestamp rather than an error.
Weaviate provides a /v1/.well-known/ready endpoint for health checking. This endpoint returns a 200 status only when Weaviate has completed its startup sequence, loaded its schema, and is ready to accept queries. Use this rather than a simple TCP port check to ensure the service is genuinely ready.
Vector databases with large indices may need extended start periods similar to model servers. A Qdrant instance with a 10 GB index may take 30 to 60 seconds to load the index into memory before it can serve queries. Monitor your actual startup times and set the start_period accordingly, adding a 50 percent buffer for slower starts under load.
Implement Health Checks for Your Agent Runtime
If your agent exposes an HTTP API, add a /health endpoint that returns a 200 status when the agent is ready. A thorough health endpoint does more than return a static response. It verifies that the agent can connect to its database (run a simple query), reach its model server (send a lightweight request), and access its vector database (perform a test search). If any dependency is unreachable, the health endpoint returns a 503 status.
For agents without HTTP endpoints (like agents that poll a queue or run on a schedule), create a health check script that tests the same conditions. The script should verify process liveness, dependency connectivity, and basic functionality. Docker runs this script at the configured interval and uses its exit code (0 for healthy, 1 for unhealthy) to determine container health status.
Avoid health checks that are too expensive to run frequently. A health check that sends a full inference request to the model server consumes GPU resources and adds latency to real requests. Instead, use lightweight checks: verify the HTTP connection is open, confirm authentication succeeds, and check that the model server reports a loaded model. Save full end-to-end testing for your monitoring system rather than the Docker health check.
Log health check failures inside your agent so you can diagnose intermittent health issues. When a dependency check fails, log the specific error (connection refused, timeout, authentication failure, unexpected response) with enough context to identify the root cause. These logs are invaluable when debugging why Docker marked a container unhealthy and restarted it.
Wire Health Checks into Dependency Ordering
Docker Compose depends_on with condition: service_healthy waits for the dependency to pass its health check before starting the dependent service. This is the correct way to handle startup ordering for AI agent stacks where services must be fully ready, not just running, before their dependents start.
Map out your service dependency graph and configure depends_on accordingly. A typical AI agent stack has the following dependency chain: the database starts first and passes its health check, the model server starts next and loads its model (verified by health check), the vector database starts and loads its indices (health check), and finally the agent runtime starts and connects to all three services.
Handle circular health check dependencies by breaking the cycle. If service A checks for service B in its health check, and service B checks for service A, neither will ever become healthy. Break the cycle by having one service use a simpler health check that does not depend on the other service. The more dependent service (usually the agent runtime that depends on infrastructure services) should do the comprehensive dependency checking.
Test your health check configuration by starting the stack fresh and monitoring the startup sequence with docker compose ps. Verify that services start in the expected order and that each service waits for its dependencies. Also test failure scenarios: stop a dependency and verify that the dependent service detects the failure through its health check and that Docker restarts the failed dependency.
Implement service-specific health checks using native readiness tools for databases, API endpoints for model servers, and custom dependency-checking endpoints for your agent runtime. Wire them together with depends_on conditions to create a reliable startup sequence that prevents race conditions.