How to Deploy LangGraph Applications
The gap between a working LangGraph agent in development and a reliable production deployment is significant. Development uses MemorySaver for checkpoints (which loses data on restart), runs as a single process (which cannot handle concurrent requests), and relies on manual testing (which misses intermittent failures). Production needs durable persistence, horizontal scaling, automated monitoring, and operational tooling.
Choose Your Deployment Strategy
LangSmith Deployment (Managed): The fastest path to production. You push your code with a langgraph.json configuration file, and the platform handles container building, scaling, task queues, and health monitoring. Use the langgraph deploy CLI command to deploy. The platform supports zero-downtime deployments with automatic rollback if health checks fail. This option costs money but saves significant engineering time.
Self-Hosted with Docker and Kubernetes: Package your LangGraph agent as a Docker container behind a FastAPI API layer. Deploy to Kubernetes for horizontal scaling and orchestration. You manage the full infrastructure stack including load balancing, auto-scaling, certificate management, and deployment pipelines. This approach is free of platform fees but requires DevOps expertise.
Hybrid: Use LangSmith for tracing and observability while self-hosting the agent itself. This gives you the monitoring benefits of the managed platform without depending on it for compute. Many teams start fully managed and migrate to hybrid as their operational maturity grows.
Configure Production Checkpointing
Replace MemorySaver with PostgresSaver for durable checkpoint storage. PostgresSaver stores checkpoints in a PostgreSQL database, providing crash recovery, horizontal scaling through connection pooling, and concurrent access from multiple agent processes.
Set up a PostgreSQL instance, either a managed service like AWS RDS, Google Cloud SQL, or Azure Database for PostgreSQL, or a self-managed installation. Create a dedicated database for checkpoints, configure connection pooling (pgBouncer is common) for handling concurrent agent runs, and set appropriate retention policies to manage storage growth.
The switch from MemorySaver to PostgresSaver requires no changes to your graph code. Only the checkpointer instantiation changes: instead of creating a MemorySaver, you create a PostgresSaver with your database connection string. Existing graph logic, nodes, edges, and state definitions remain identical.
For teams on AWS, DynamoDBSaver is an alternative that integrates with Amazon's serverless database. DynamoDB's pay-per-use pricing can be cost-effective for workloads with variable checkpoint volumes.
Set Up the API Layer
Wrap your LangGraph agent in a FastAPI application that exposes HTTP endpoints for client access. At minimum, you need an endpoint that accepts user messages and returns agent responses, an endpoint that creates new conversation threads, and optionally a streaming endpoint that sends agent responses token by token for real-time user experience.
Add authentication to protect your agent API. API keys are the simplest approach for server-to-server communication. OAuth or JWT tokens are appropriate for user-facing applications. Rate limiting prevents individual clients from overwhelming the agent with requests.
Configure proper request timeouts. Agent workflows can take seconds to minutes depending on the number of LLM calls and tool invocations. Set HTTP timeouts high enough to accommodate your longest expected workflow, and implement long-polling or WebSocket connections for extended interactions.
For the managed platform, LangSmith Deployment provides the API layer automatically. The langgraph deploy command creates endpoints for invoking the agent, streaming responses, managing threads, and inspecting state.
Configure Monitoring and Observability
Production agent systems need comprehensive monitoring because agents are non-deterministic. The same input can produce different outputs on different runs, and failures often manifest as incorrect behavior rather than crashes.
LangSmith tracing provides the deepest integration with LangGraph, capturing every node execution, LLM call, tool invocation, and state transition. Traces let you see exactly what happened during any agent run, identify failure patterns, and measure quality metrics. The Plus tier ($39 per seat per month) includes enough traces for most production workloads.
For teams that prefer open-source observability, Langfuse is a popular alternative that provides similar tracing and evaluation capabilities without vendor lock-in. Other options include OpenTelemetry-based tracing, custom logging solutions, and general-purpose application monitoring platforms.
Set up alerting for key metrics: agent error rates, average response latency, tool call failure rates, and checkpoint storage growth. Alerts should fire before issues affect users, not after.
Deploy and Validate
For managed deployment, run langgraph deploy from your project directory. The platform builds your container, runs health checks, and promotes it to production. Monitor the deployment dashboard for any issues during rollout.
For self-hosted deployment, build your Docker image, push it to your container registry, and update your Kubernetes deployment. Use rolling updates to avoid downtime during deployments. Run smoke tests that exercise the full agent workflow including tool calls, checkpoint persistence, and error handling.
Validate that checkpointing works correctly by starting a conversation, restarting the agent process, and resuming the conversation. The agent should pick up exactly where it left off with full context preserved. This test catches checkpointing configuration issues that would cause data loss in production.
Confirm that monitoring is capturing data by reviewing traces in LangSmith or your chosen observability tool. Verify that node executions, LLM calls, and tool invocations all appear with correct timing and metadata.
Production Middleware
LangGraph v1.1 (December 2025) introduced middleware for production reliability. Model retry middleware provides configurable exponential backoff for LLM API calls, handling transient failures without custom retry logic in your nodes. Content moderation middleware filters unsafe content from agent responses before they reach users. These middleware components plug into the graph compilation step and require no changes to your node logic.
Scaling Considerations
Horizontal scaling for LangGraph agents means running multiple agent processes behind a load balancer, all sharing the same PostgreSQL checkpoint store. Each process handles its own subset of concurrent requests, and the shared database ensures state consistency. Connection pooling is essential to prevent the database from being overwhelmed by checkpoint read and write operations.
For very high traffic, consider partitioning checkpoints across multiple database instances or using DynamoDB's auto-scaling capabilities. Monitor checkpoint serialization time, as very large state objects can create bottlenecks during high-concurrency periods.
Production LangGraph deployment requires PostgresSaver for durable checkpointing, an API layer for client access, and monitoring for operational visibility. The managed LangSmith Deployment platform handles these requirements automatically, while self-hosted deployments provide more control at the cost of infrastructure management.