AI Agent Risk Categories and Severity Levels

Updated May 2026

AI agent risks span four major categories: security threats, operational failures, compliance violations, and reputational damage. Understanding how to classify these risks by category and severity is essential for allocating safety resources effectively and building governance frameworks that match the actual risk profile of each agent deployment.

The OWASP Agentic AI Risk Framework

The OWASP Foundation released its Top 10 for Agentic Applications in December 2025, establishing the first widely adopted taxonomy of risks specific to autonomous AI systems. This framework identifies ten categories designated ASI01 through ASI10, each representing a distinct failure mode in the agentic technology stack. The categories cover agent planning corruption, unsafe tool execution, identity and access failures, supply chain vulnerabilities, code execution risks, memory poisoning, inter-agent communication exploits, cascading failures, human-agent trust breakdowns, and rogue agent behavior.

The OWASP framework is particularly valuable because it distinguishes agentic risks from the broader LLM risk landscape. While prompt injection and data poisoning appear in both the LLM and agentic top ten lists, the agentic framework addresses three categories that the LLM list covers incompletely: tool use where the agent can take actions in the world rather than just producing text, multi-step reasoning where a single prompt injection can compound across many decision steps, and inter-agent communication where agents exchange messages through protocols like MCP that create novel attack surfaces.

Security Risk Categories

Security risks represent the most immediately dangerous category because they involve adversarial actors actively attempting to compromise agent systems. These risks require both preventive controls and detective mechanisms to address effectively.

Prompt Injection

Prompt injection attacks manipulate the agent into performing unintended actions by embedding malicious instructions in user inputs or data sources. Direct injection targets the agent interface, while indirect injection poisons the data the agent consumes from external sources. In an agentic context, successful prompt injection can trigger tool execution, data exfiltration, privilege escalation, and persistent compromise. This risk is classified as critical severity because it can lead to arbitrary action execution with whatever permissions the agent holds.

Data Exfiltration

Agents with access to sensitive data can be manipulated or malfunction in ways that expose that data to unauthorized parties. Exfiltration can occur through agent responses, logging systems, memory persistence, or side channels created by the agent interacting with external services. The severity depends on the sensitivity of the data accessible to the agent, ranging from low for public data to critical for personally identifiable information, financial records, or trade secrets.

Privilege Escalation

Agents may be tricked into using their granted permissions in unintended ways, effectively escalating the privileges of the attacker beyond what they would normally have. An agent with write access to a configuration file could be manipulated into modifying access control rules, granting broader access than originally intended. Privilege escalation is classified as high to critical severity depending on the maximum permissions available to the agent and the sensitivity of systems reachable through those permissions.

Supply Chain Compromise

The tools, plugins, APIs, and model components that agents depend on represent supply chain risks that extend beyond the organization direct control. A compromised tool can feed malicious data to the agent, a vulnerable API can be exploited through agent-initiated requests, and a poisoned model component can systematically bias agent decisions in ways that are difficult to detect. Supply chain risks are particularly insidious because they can persist for extended periods before discovery and affect every agent that uses the compromised component.

Operational Risk Categories

Operational risks arise from agent malfunctions, misconfigurations, and environmental failures rather than adversarial attacks. These risks are often more common than security incidents but can be equally damaging to business operations and customer trust.

Hallucination and Confabulation

Agents can generate confident but incorrect outputs, leading to actions based on fabricated information that the agent presents as factual. In a chatbot context this produces wrong answers that humans can verify, but in an agent context it can trigger irreversible real-world actions based on false premises. An agent might confidently state that a customer account has a specific balance, then execute a transaction based on that fabricated number. This risk ranges from medium to critical depending on the domain and the irreversibility of the actions the agent can take.

Cascading Failures

In multi-agent systems, one malfunctioning agent can corrupt the inputs and behavior of downstream agents, creating a cascade of failures that amplifies the original problem far beyond its initial scope. Cascading failures are particularly dangerous because they can rapidly exceed the scope of any single agent containment boundary and affect systems that have no direct connection to the original failure. The severity is high for any multi-agent deployment and critical for systems where agent outputs trigger automated actions in production environments.

Resource Exhaustion

Runaway agents can consume excessive computational resources, API calls, database connections, or network bandwidth, causing denial of service for other systems and accumulating significant financial costs. Without rate limiting and resource quotas, a single malfunctioning agent can exhaust shared resources and impact entirely unrelated services that share the same infrastructure. This risk is medium severity for isolated agents but high for agents sharing infrastructure with production systems that serve customers.

Configuration Drift

Agent configurations, permissions, and connected tools can change over time without corresponding updates to safety controls or risk assessments. Developers add new capabilities during feature development without revisiting security reviews. Tool interfaces change without triggering safety reassessments. The cumulative effect of configuration drift is an agent whose actual risk profile diverges significantly from its documented risk profile, undermining the entire governance framework that depends on accurate risk classification.

Compliance Risk Categories

Compliance risks arise when agent operations violate regulatory requirements, contractual obligations, or industry standards. These risks carry direct financial penalties and can restrict an organization ability to operate in regulated markets.

Data Protection Violations

Agents that process personal data without adequate safeguards violate GDPR, CCPA, and other data protection regulations. Common violations include collecting more data than necessary for the agent task, retaining data in agent memory systems beyond permitted periods, processing data for purposes beyond the original consent, and failing to honor data subject access or deletion requests that extend to agent storage. Penalties under GDPR can reach 4% of global annual turnover, making this a critical severity risk for any organization processing EU resident data.

Regulated Industry Violations

Healthcare agents must comply with HIPAA requirements for protected health information, financial agents with SOX and SEC regulations for transaction integrity, and agents handling payment data with PCI DSS standards for cardholder security. Each regulatory framework imposes specific requirements on data handling, access controls, audit trails, and incident reporting that must be implemented at the agent level. Non-compliance can result in fines, license revocations, and criminal penalties depending on the jurisdiction and severity of the violation.

EU AI Act Non-Compliance

The EU AI Act classifies AI systems by risk level and imposes escalating obligations for higher-risk categories. Autonomous agents that make decisions affecting health, safety, or fundamental rights are likely to fall into the high-risk category, requiring formal risk management systems, data governance practices, technical documentation, transparency measures, human oversight mechanisms, and demonstrated accuracy and robustness. Non-compliance penalties reach up to 35 million euros or 7% of global annual turnover, with full enforcement beginning in August 2026.

Reputational Risk Categories

Reputational risks arise when agent behavior damages the organization public image, customer trust, or stakeholder relationships. While harder to quantify financially than compliance penalties, reputational damage can have lasting effects on revenue, partnerships, and talent acquisition that persist long after the triggering incident is resolved.

Biased or discriminatory outputs from agents damage the organization reputation with affected communities and the broader public. Unauthorized communications sent by agents on behalf of the organization create PR crises when those messages are inaccurate, inappropriate, or poorly timed. Customer data incidents involving agents attract media attention that can define public perception of the organization for years.

Building a Severity Classification Framework

Organizations should classify agent risks using a severity framework that considers both the likelihood of occurrence and the potential impact if the risk materializes. A four-level severity scale provides sufficient granularity for most organizations while remaining practical to implement and communicate.

Critical severity applies to risks that could cause irreversible harm, major financial loss exceeding defined thresholds, regulatory enforcement actions, or widespread data exposure affecting thousands of individuals. High severity covers risks that could cause significant operational disruption lasting more than a defined period, meaningful financial impact, or limited data exposure affecting a defined number of individuals. Medium severity addresses risks that could cause moderate operational impact, minor financial consequences within acceptable tolerance, or internal data exposure limited to authorized personnel. Low severity covers risks with minimal operational or financial impact that can be addressed through normal operational processes without escalation.

Each severity level should map to specific response requirements. Critical and high severity risks require immediate attention, senior leadership notification, and sign-off before the associated agent can continue operating. Medium risks should be addressed within defined timelines and tracked through the governance process. Low risks should be documented and reviewed during regular governance cycles.

Key Takeaway

AI agent risks span security, operational, compliance, and reputational categories. Use the OWASP Agentic AI framework as a starting taxonomy, then classify each risk by severity based on likelihood and impact to prioritize your safety investments and governance resources effectively.