Security of Open Source AI Agents
AI Agent Threat Landscape
Prompt injection is the most significant security threat to AI agents. An attacker crafts input that overrides the agents system prompt, causing it to ignore its instructions and follow the attackers directions instead. For a customer support agent, this could mean extracting confidential customer data, generating harmful responses, or executing unauthorized actions through tool calls. Prompt injection attacks are difficult to prevent completely because the agent must process arbitrary user input to function, and current LLMs cannot perfectly distinguish between legitimate instructions and injected commands.
Data leakage occurs when the agent inadvertently exposes sensitive information in its responses. This can happen when RAG retrieval pulls in documents that the current user should not have access to, when the LLM generates responses based on training data that includes private information, or when conversation history from one user session bleeds into another. Data leakage is particularly dangerous for multi-user deployments where different users have different access levels.
Tool calling abuse happens when an attacker manipulates the agent into executing tools in unintended ways. If the agent has access to a database tool, file system tool, or API tool, prompt injection can potentially direct the agent to read sensitive data, modify records, delete files, or make unauthorized API calls. The severity depends on what tools the agent has access to and what permissions those tools operate with.
Supply chain attacks target the complex dependency chains that AI agent frameworks rely on. A typical agent deployment depends on the framework code, multiple Python or Node.js packages, model provider SDKs, vector database clients, and integration libraries. A compromised package in any of these dependencies can inject malicious code into your agent deployment. The AI agent ecosystem is particularly vulnerable because many packages are new, rapidly evolving, and maintained by small teams.
Defending Against Prompt Injection
Input sanitization is the first line of defense. Filter user input for common prompt injection patterns, including instruction override attempts (ignore previous instructions, you are now), role manipulation (pretend you are, act as), and system prompt extraction (repeat your system prompt, what are your instructions). While determined attackers can bypass simple filters, sanitization catches the majority of casual injection attempts.
Output filtering prevents the agent from exposing sensitive information even if the prompt injection succeeds. Check agent responses for patterns that indicate data leakage, such as system prompt fragments, database connection strings, API keys, internal file paths, or customer data that the current user should not see. Reject responses that match these patterns and return a safe default response instead.
Least-privilege tool access limits the damage that a successful prompt injection can cause. If the agent only has read access to the knowledge base and cannot modify databases, delete files, or make external API calls, then even a successful injection attack has limited impact. Review every tool the agent has access to and ask whether the agent genuinely needs each capability. Remove any tool access that is not essential to the agents function.
Layered defense means implementing multiple independent security controls so that no single failure compromises the system. Combine input sanitization, output filtering, tool permission restrictions, rate limiting, conversation monitoring, and anomaly detection. Each layer catches attacks that slip past the others. No single defense is sufficient against determined attackers, but the combination makes successful exploitation significantly more difficult.
Data Protection and Access Control
Conversation data storage requires the same security treatment as any other sensitive data. Encrypt conversations at rest and in transit, implement access controls that limit who can read conversation logs, establish retention policies that automatically delete conversations after a defined period, and ensure that backup procedures include conversation data. For deployments subject to GDPR or similar regulations, implement data subject access requests and deletion procedures for conversation data.
RAG document access control prevents users from accessing documents through the agent that they could not access directly. Implement document-level permissions in your vector database so that RAG retrieval only returns documents that the current user is authorized to see. Without this control, a user could ask the agent questions that trigger retrieval of documents they should not have access to, effectively using the agent as a privilege escalation tool.
Multi-tenant isolation is critical for deployments where multiple users or organizations share the same agent infrastructure. Each tenant should have isolated conversation histories, separate RAG document stores, and independent configuration. Sharing any of these across tenants creates cross-contamination risk where one users queries or data affect another users experience. Test isolation thoroughly by attempting to access one tenants data from another tenants session.
API key and credential management for AI agents follows the same principles as any application, but agents often need credentials for multiple services (LLM providers, databases, external APIs) which increases the risk surface. Store credentials in a secrets manager rather than configuration files, rotate credentials regularly, and use the minimum permission level required for each credential. Monitor credential usage for unusual patterns that might indicate compromise.
Operational Security Practices
Dependency auditing should be a regular practice, not a one-time activity. Use automated tools to scan your agent dependencies for known vulnerabilities, and subscribe to security advisories for the frameworks and libraries you depend on. Pin dependency versions in production to prevent automatic updates from introducing vulnerable or malicious code. Test dependency updates in a staging environment before deploying to production.
Monitoring and anomaly detection help you identify attacks in progress. Track metrics like unusual conversation patterns, sudden increases in tool call frequency, responses that trigger output filters, and authentication failures. Set up alerts that notify your security team when these metrics deviate from normal baselines. Early detection limits the damage from successful attacks.
Sandboxing agent execution limits the impact of both prompt injection and code execution vulnerabilities. Run the agent in a container or virtual machine with restricted network access, limited file system permissions, and no access to production databases or services beyond what it specifically needs. OpenHands uses sandboxing effectively for its code execution capabilities, and the same principle applies to any agent that has tool access.
Regular security assessments should include testing the agent specifically for prompt injection vulnerabilities, not just traditional application security testing. Have your security team or a third-party assessor attempt to extract the system prompt, manipulate tool usage, access unauthorized data, and bypass output filters. Document the results and address any vulnerabilities found before they are exploited by actual attackers.
The Open Source Security Advantage
Open source provides a security advantage through transparency. You can audit the complete agent codebase to verify that it does not send data to unauthorized endpoints, does not include backdoors, and implements security controls correctly. Proprietary agents require you to trust the vendors security practices without the ability to verify them. For security-sensitive deployments, the ability to audit the code is a significant benefit.
Community review means that vulnerabilities in popular open source agents are more likely to be discovered and reported than in proprietary software. The more eyes on the code, the faster security issues surface. Major open source AI agent projects have security disclosure processes that let researchers report vulnerabilities privately so they can be fixed before public disclosure.
Rapid patching is possible because you control the deployment. When a vulnerability is discovered, you can apply patches immediately without waiting for a vendor to release an update. For critical vulnerabilities, the ability to patch within hours rather than waiting for the next vendor release cycle can be the difference between a security incident and a near-miss.
AI agents face unique security threats including prompt injection, data leakage, tool calling abuse, and supply chain attacks. Defense requires layered controls combining input sanitization, output filtering, least-privilege tool access, data encryption, and regular security assessments.