How to Self-Host Hermes Agent
Self-hosting is the most popular deployment method for Hermes Agent, giving you complete control over your data and infrastructure while keeping costs under $10 per month. This guide covers the production-ready setup process from server provisioning through security hardening.
Provision a Server
Hermes Agent's hardware requirements are modest: 1 vCPU, 2GB RAM, and 20GB storage handle typical personal assistant workloads. The most cost-effective options are Hetzner CX22 at $4.35/month, DigitalOcean Basic Droplet at $6/month, or Linode Nanode at $5/month. Any Linux distribution works, though Ubuntu 22.04 LTS and Debian 12 are the most commonly used in the community.
If you plan to run Ollama alongside Hermes for local model inference, increase the RAM to at least 8GB (16GB recommended for 8B parameter models). GPU-equipped servers are available from Lambda Labs, RunPod, and Vast.ai if you need faster local inference, though cloud API models work fine on CPU-only servers.
Install Docker
Install Docker Engine following the official documentation for your Linux distribution. Docker Compose is included with modern Docker installations. Verify the installation by running docker --version and docker compose version. Ensure your user is in the docker group so you can run Docker commands without sudo.
For Ollama integration, install Ollama separately on the host (not inside Docker) for the simplest networking setup. Pull your preferred model with ollama pull hermes3:8b (the recommended starting model). Verify Ollama is running and accessible at localhost:11434.
Configure the Agent
Create a directory structure for your Hermes installation. The recommended layout separates configuration, data, and logs into distinct directories. Create your config.yaml file with your model provider credentials, messaging platform tokens, and any custom tool or soul settings.
The configuration file supports environment variable substitution, so you can store sensitive values (API keys, bot tokens) in a .env file rather than hardcoding them. This is the recommended approach for production deployments. The Hermes repository includes example configurations for common setups that you can use as starting points.
Set Up Persistence
The most important step for production self-hosting is configuring persistent storage. The memory database (SQLite) and skill library (markdown files) must survive container restarts, updates, and host reboots. Map these directories as Docker volumes in your compose file.
Configure regular backups of the data directory. A simple cron job that copies the directory to a backup location (local, S3, or another cloud storage) runs daily and ensures you can recover from hardware failures. The backup is small, typically under 100MB even after months of operation.
Enable Auto-Restart and Monitoring
Set the Docker restart policy to "unless-stopped" or "always" so the agent survives server reboots and unexpected crashes. Configure log rotation to prevent the Docker logs from filling your disk over time. The agent produces moderate log volume under normal operation, but verbose logging (useful during initial setup) can generate significant output.
For monitoring, the agent exposes a health check endpoint on the web dashboard port. You can use a simple uptime monitor (UptimeRobot, Healthchecks.io) to alert you if the agent goes offline. For more detailed monitoring, the agent's logs can be shipped to any log aggregation service.
Harden Security
Configure your server's firewall to allow only necessary ports. If you are using the web dashboard, restrict access to your IP address or put it behind a reverse proxy with authentication. The messaging platform connections are outbound only and do not require open ports.
Review and restrict the agent's tool permissions in config.yaml. The default configuration enables all built-in tools, which includes file system access and shell command execution. For production deployments, restrict these tools to specific directories and commands that the agent actually needs. Disable any tools you do not use to reduce the attack surface.
If you run Hermes in a Docker container (recommended), the container provides an additional layer of isolation. The agent cannot access the host filesystem or network beyond what you explicitly map in the compose file.
Updating and Maintaining Your Installation
Keeping your Hermes Agent installation current is straightforward with Docker. Updates are handled by pulling the latest image and restarting the container. The command sequence is docker compose pull followed by docker compose up -d, which downloads the new image and restarts the agent with zero downtime on your data (since the memory database and skill library persist in mounted volumes). Before updating, it is good practice to create a backup of your data directory in case you need to roll back.
Hermes follows a rapid release cadence with frequent updates that include new features, bug fixes, and model compatibility improvements. The project uses semantic versioning, so you can pin to a specific version tag in your compose file if you prefer stability over the latest features. The community recommends updating at least monthly to stay current with security patches and model support improvements. Release notes on GitHub detail the changes in each version, helping you decide whether to update immediately or wait.
Troubleshooting Common Issues
The most common issue new self-hosters encounter is incorrect model API configuration. Symptoms include the agent starting successfully but failing to respond to messages, or responding with error messages about authentication failures. The solution is to verify your API key by testing it directly against the model provider's API endpoint before configuring it in Hermes. The agent's logs (accessible via docker compose logs) show the exact error message from the model provider, which usually points directly to the issue.
Messaging platform connectivity is the second most common problem area. Each platform has its own authentication flow, and token expiration or webhook URL misconfiguration can cause silent failures where messages are sent but never reach the agent. The Hermes web dashboard shows the connection status for each configured platform, making it easy to identify which integrations are working and which need attention. The community Discord channel has dedicated help threads for each major messaging platform with step-by-step troubleshooting guides.
Memory database corruption is rare but can occur if the container is killed without a graceful shutdown (for example, during a sudden power loss). SQLite is resilient to most crash scenarios, but severe cases may require restoring from backup. Running the agent with WAL (Write-Ahead Logging) mode enabled (the default since v0.12) provides additional protection against corruption during unexpected shutdowns.
Performance Optimization for VPS Hosting
On budget VPS instances with limited RAM, swap configuration can prevent out-of-memory crashes during peak usage. Adding 2GB of swap space ensures the agent survives temporary memory spikes without being killed by the OOM handler. This is especially important if you are running Hermes alongside other services on the same VPS.
For users who interact with their agent heavily throughout the day, enabling prompt caching at the model provider level reduces both latency and cost. Anthropic and Google both support prompt caching, which stores the static portions of the agent's prompts (system message, tool definitions, soul file) and charges reduced rates for cached content. Since these static portions are identical across requests, caching can reduce input token costs by 20 to 30% with no impact on response quality.
Multi-Instance Deployments
Some users run multiple Hermes instances on the same server, each configured for different purposes or different user groups. A common pattern is running a personal assistant instance with full tool access on port 3000 and a team-facing instance with restricted permissions on port 3001. Docker Compose makes this straightforward by defining multiple services in the same compose file, each with its own configuration directory and data volumes.
Multi-instance deployments should account for the combined resource usage. Each Hermes instance uses approximately 200 to 400MB of RAM during normal operation, with spikes during heavy processing. A 4GB VPS can comfortably run two to three instances alongside the operating system and Docker overhead. If instances need to share skills or memories, this requires custom synchronization since each instance maintains its own independent database.
Self-hosting Hermes Agent on a $5 VPS with Docker provides complete data sovereignty and costs under $10/month total. The key production steps are persistent storage, automatic restarts, regular backups, and tool permission hardening.