Webinar: What's New in Graylog 7.1 | Watch On-demand >>

What is the OWASP Top 10 Agentic AI

Published by the Open Worldwide Application Security Project (OWASP) in 2025, the OWASP Top 10 for Agentic Applications 2026 identifies security risks that organizations need to consider when implementing agentic artificial intelligence (AI) systems. The guide focuses on how threat actors can exploit agentic systems in new ways and on the associated risk mitigation strategies.

Whether deploying agentic AI as internally built systems or agents embedded in larger platforms, organizations should understand these threats and the mitigation strategies that improve security posture, reduce operational risk, and protect sensitive data.

ASI01: Agent Goal Hijack

Agent Goal Hijack occurs when attackers manipulate an AI agent’s objectives or decision-making by exploiting weaknesses in how it interprets natural-language instructions. Agents are unable to reliably distinguish legitimate instructions from malicious content, enabling attackers to redirect goals, task selection, or multi-step behaviors.

Some mitigation strategies include:

Treat all natural-language inputs as untrusted: Validate all user text, documents, and retrieved content against prompt-injection and input-sanitization safeguards before they influence agent decisions.
Minimize impact of goal hijacking: Limit agent tool privileges and require human approval for goal-altering or high-impact actions.
Define and lock agent system prompts: Require controlled configuration management with human approval for explicit, auditable, and changeable goals and allowed actions.
Run-time intent validation: Verify user and agent intent before executing goal-changing actions, pausing or blocking execution if deviations occur, and record all events for audit.
Intent capsule: Lock agent goals and rules in a secure package for each task.
Sanitize connected data sources: Use content detection and filtering to validate all inputs from emails, files, APIs, browsing outputs, and peer agents before they can affect goals or actions.
Logging, monitoring, and baseline tracking: Continuously log agent activity, track goal states and tool use, and alert on any deviations or anomalous behaviors.
Red-team testing: Periodically simulate goal overrides to verify the agent can detect, block, or roll back unauthorized changes.
Insider threat integration: Monitor agent interactions to identify suspicious insider behavior for access to sensitive data or agent behavior manipulation.

ASI02: Tool Misuse and Exploitation

Agents can misuse legitimate tools during the selection and application process when problems are ambiguous, misaligned, or delegated unsafely, leading to:

Data leaks.
Workflow hijacks.
Manipulated outputs.

During the selection and application process, memory, dynamic tool selection, or chaining can lead to unintentional privilege escalation or action triggering.

Some mitigation strategies include:

Least Agency and least privilege for tools: Create per-tool profiles that restrict permission, data access, and functionality to only what is necessary.
Action-level authentication and approval: Require explicit authentication and human confirmation every time a tool takes an action, especially destructive or high-impact ones, with pre-execution previews or dry-run plans.
Execution sandboxes and egress controls: Use isolated environments to run tools or execute code while enforcing all network security policies.
Policy enforcement middleware (“intent gate”): Treat agent outputs as untrusted, and use a Policy Enforcement Point (PEP/PDP) to validate actions, enforce schemas and rate limits, issue short-lived credentials, and revoke or audit drift.
Adaptive tool budgeting: Limit tool usage by cost, rate, or token budgets, automatically throttling or revoking access when limits are exceeded.
Just-in-time and ephemeral access: Provide temporary credentials or API tokens tied to specific sessions that expire after use.
Semantic and identity validation (“semantic firewalls”): Confirm tool names, versions, and intended action semantics to prevent collisions, typosquatting, or ambiguous executions, failing safely when unclear.
Logging, monitoring, and drift detection: Record all tool actions and continuously monitor for abnormal usage patterns, unusual chaining, or policy violations.

ASI03: Identity and Privilege Abuse

Identity and privilege abuse risk arises from an AI agent exploiting its own or another agent’s credentials, roles, or trust relationships to gain additional access beyond its permissions. Attackers can hijack shared or inherited permissions across systems to perform unauthorized actions. Both the agent’s persona and any authentication tokens it holds can be misused to bypass safeguards.

Some mitigation strategies include:

Enforce task-scoped, time-bound permissions: Limit agent permissions to only specific tasks for a limited time to prevent misuse or privilege escalation.
Isolate agent identities and contexts: Separate agent’s session from memory and erase it between tasks to stop data leaks or attacks that exploit memory.
Mandate per-action authorization: Use a central policy engine to check every sensitive action to mitigate unauthorized cross-agent access or privilege abuse.
Apply human-in-the-loop for privilege escalation: Require human approval for high-risk actions to mitigate elevated privilege misuse.
Define intent: Link tokens to a specific purpose, user, and session to mitigate unintended reuse risk.
Evaluate agentic identity management platforms: Use platforms that treat agents like managed non-human identities with scoped credentials, audits, and lifecycle controls.
Bind permissions to subject, resource, purpose, and duration: Make permissions specific to who, what, why, and how long, and revoke or re-check them when context changes.
Detect delegated and transitive permissions: Monitor for extra permissions gained indirectly, and flag unexpected inheritance.
Detect abnormal cross-agent privilege elevation: Monitor agents requesting or reusing permissions outside their intended purpose, and flag suspicious activity.

ASI04: Agentic Supply Chain Vulnerabilities

Agentic supply chain vulnerabilities happen when AI agents rely on third-party tools, models, or data that may be compromised or malicious. Agentic systems often load and coordinate these components at runtime, which can expand the attack surface through hidden instructions, unsafe code, or deceptive behaviors while the agent is running. Across this live supply chain, a single compromised component can create a domino effect that impacts multiple agents.

Some mitigation strategies include:

Provenance and SBOMs, AIBOMs: Track and verify all AI components, tools, and prompts using signed manifests and inventories to ensure only trusted sources are used.
Dependency gatekeeping: Only allow verified, signed, and correctly spelled dependencies, automatically rejecting untrusted or tampered ones.
Containment and builds: Run sensitive agents in isolated, controlled environments with strict limits, and require reproducible builds.
Secure prompts and memory: Control prompts, scripts, and memory schema versions with peer review and scan for anomalies.
Inter-agent security: Require mutual authentication, message signing, and verification between agents, and block unregistered or unverified connections.
Pinning: Lock prompts, tools, and configurations to specific versions or hashes, rolling out changes gradually and rolling back if unexpected behavior occurs.
Supply chain kill switch: Implement a mechanism to instantly disable compromised tools, prompts, or agent connections across all systems to stop damage.
Zero-trust security model in application design: Verify and control every action while assuming any agent or AI component could fail or be exploited.

ASI05: Unexpected Code Execution (RCE)

Unexpected code execution happens when AI agents generate or run code that attackers can exploit to:

Take over systems
Run malicious scripts
Escape isolated environments.

Attackers can use the code these agents produce to bypass traditional security controls, enabling them to turn harmless text into executable commands with techniques like prompt injection, tool misuse, or unsafe data handling.

Some mitigation strategies include:

Sanitize agent-generated code: Validate inputs and encode outputs to ensure any code the agent generates is safe before execution.
Pre-production checks for vibe coding: Test and evaluate agent-generated code in safe, pre-production environments with adversarial tests to catch unsafe behavior.
Ban eval in production agents: Disallow direct execution of arbitrary code, and use safe interpreters and track tainted inputs.
Execution environment security: Run code in isolated containers with strict limits, no root access, and restricted filesystem and network access.
Architecture and design: Isolate each agent session, enforce least privilege, and separate code generation from execution.
Access control and approvals: Require human approval for high-risk actions, maintain allowlists, and enforce role-based controls.
Code analysis and monitoring: Scan code before execution, monitor runtime behavior, monitor for prompt injection, and log all activity.

ASI06: Memory and Context Poisoning

Memory and context poisoning occur when attackers compromise the stored and retrievable data that agentic systems use, like by ingesting untrusted data from untrusted or partially validated sources like uploads, API feeds, user inputs, or peer-agent exchanges. This security issue can lead to indirect goal or reasoning manipulation, making it different from direct goal hijacking.

Some mitigation strategies include:

Baseline data protection: Use encryption and strong access controls to protect stored and transmitted memory.
Content validation: Scan all new memory entries and model outputs for malicious or sensitive content before saving them.
Memory segmentation: Separate user sessions and subject areas to prevent data leakage.
Access and retention: Allow only trusted, authenticated sources to add memory, limit access based on the task, and minimize data retention times.
Provenance and anomalies: Track where memory entries come from and flag unusual updates or suspicious patterns.
Prevent automatic re-ingestion: Stop agents from automatically saving and re-sending their own generated outputs back into trusted memory.
Resilience and verification: Regularly test memory systems for attacks, use version control and rollback options, and require human review for high-risk changes.
Expire unverified memory: Automatically delete or downgrade unverified memory.
Weight retrieval by trust and tenancy: Require at least two trust signals before using stored information, and reduce untrusted entries used over time.

ASI07: Insecure Inter-Agent Communication

Multi-agent systems communicate through APIs, shared memory, or message systems that increase the overall attack surface. These decentralized systems may have agents with different trust or autonomy levels. When no authentication or validation exists for these communications, attackers can intercept, spoof, alter, or block messages.

Some mitigation strategies include:

Secure agent channels: Encrypt all agent-to-agent communication and require both sides to verify each other’s identity.
Message integrity and semantic protection: Digitally sign and validate messages and their meaning to detect tampering or malicious instructions.
Agent-aware anti-replay: Use timestamps, session IDs, and unique message markers to prevent attackers from reusing old messages in new contexts.
Protocol and capability security: Disable outdated communication methods and ensure agents only connect using approved, identity-bound protocols and capabilities.
Limit metadata-based inference: Keep messages the same size and send them at regular intervals so attackers are unable to use metadata to infer agent roles or decisions.
Protocol pinning and version enforcement: Restrict agents to approved protocol versions and reject downgrade attempts or incompatible message formats.
Discovery and routing protection: Secure agent discovery and routing systems with authentication, access controls, and monitoring to prevent spoofed or malicious coordination.
Attested registry and agent verification: Use trusted registries with signed identities and verified credentials to confirm agents are legitimate before allowing interaction.
Typed contracts and schema validation: Require structured, versioned message formats and reject anything that doesn’t match the agreed schema to prevent malformed or deceptive communication.

ASI08: Cascading Failures

Cascading failures happen when a single error spreads across agents that multiply into broader system problems, like a hallucination, malicious input, corrupted tool, or poisoned memory. Agents act, plan, and share information on their own so a mistake can bypass human checks and remain in memory. When an agent connects to new tools or peers, the error can spread across the system to compromise security or disrupt services.

Some mitigation strategies include:

Zero-trust model in application design: Design and build systems assuming any AI component or external source could fail or be compromised.
Isolation and trust boundaries: Keep agents in sandboxes, limit their privileges, segment networks, and use authenticated APIs to contain potential failures.
JIT, one-time tool access with runtime checks: Give agents short-lived, task-specific credentials and validate high-risk actions against rules before execution.
Independent policy enforcement: Separate planning from execution so external policy checks prevent harmful actions even if the agent’s plan is corrupted.
Output validation and human gates: Require checkpoints, governance review, or human approval for high-risk outputs before they affect downstream systems.
Rate limiting and monitoring: Detect fast-spreading commands, and pause or terminate abnormal activity.
Blast-radius guardrails: Apply quotas, progress caps, and circuit breakers between planning and execution.
Behavioral and governance drift detection: Compare agent decisions against baseline behavior and alignment, flagging gradual changes .
Digital twin replay and policy gating: Replay the past week’s agent actions in a safe, isolated copy of the system to see if they would cause cascading problems, and only allow new policies to be deployed if these tests stay within defined safety limits.
Logging and non-repudiation: Log all agent messages, policy decisions, and actions to enable auditing, rollback, and accountability.

ASI09: Human-Agent Trust Exploitation

Intelligent agents can use natural language, emotional cures, and apparent expertise so people often build trust in them. Attackers exploit this trust to evade detection while:

Manipulating people’s decisions.
Extracting sensitive data.
Influencing outcomes for malicious purposes.

As people rely more on agent recommendations and explanations, they often fail to independently verify responses. Attackers exploit authority bias and persuasiveness to manipulate people into making decisions against their best interests.

Some mitigation strategies include:

Explicit confirmations: Require multi-step human approval for high-risk actions or access to sensitive data.
Immutable logs: Log all user queries and agent actions for auditing and forensic review.
Behavioral detection: Monitor agent behavior over time to identify risky actions or sensitive data exposure.
Allow reporting of suspicious interactions: Let users flag suspicious or manipulative agent behavior, triggering review or temporary lockdown.
Adaptive trust calibration: Continuously adjust the agent’s autonomy, require human risk reviews, train employees on risk, and provide visual cues for low-confidence or high-risk actions.
Content provenance and policy enforcement: Attach verified source information and enforce runtime policy checks to ensure actions only use trusted data within scope.
Separate preview from effect: Show a preview of actions and risks with source and potential side effects before the agent takes action.
Human-factors and UI safeguards: Use visual cues and safe language to highlight high-risk recommendations and train users to recognize manipulation and agent limitations.
Plan-divergence detection: Compare agent actions against approved workflows and create alerts for deviations or unusual tool use.

ASI10: Rogue Agents

Rogue agents start acting outside their intended role in harmful or deceptive ways, often resulting from compromise or drift. While individual actions look normal, the behavior over tim can create gaps that traditional rule-based systems fail to detect. Losing behavioral integrity and governance after drift begins can lead to:

Disrupting workflows.
Spreading misinformation.
Leaking sensitive data.
Operational sabotage.

Some mitigation strategies include:

Governance and logging: Log all agent actions, tool use, and communications to detect attacks or unauthorized activity.
Isolation and boundaries: Run agents in restricted environments that have clearly defined trust zones and enforce least-privilege access.
Monitoring and detection: Use behavioral monitoring, like watchdog agents, to spot anomalies, collusion, or coordinated malicious activity.
Containment and response: Disable or quarantine rogue agents using kill switches and credential revocation to limit harm and allow forensic review.
Identity attestation and behavioral integrity enforcement: Give each agent a cryptographic identity and signed behavior manifest, continuously verifying actions against expected capabilities and goals.
Periodic behavioral attestation: Require verification steps like challenge tasks, signed lists of prompts and tools, and one-time credentials and manage through secure key systems to prevent agents from accessing keys directly.
Recovery and reintegration: Restore quarantined or remediated agents only after verifying dependencies, renewing attestations, and getting human approval.

Graylog: Security Monitoring for Agentic AI Applications

Graylog enables organizations to centralize, secure, and analyze machine data at scale so teams can detect threats faster, investigate smarter, and maintain operational visibility across complex environments. Whether deployed on-premises, in the cloud, or in hybrid architectures, Graylog’s SIEM and log management platform unifies logs and event data from diverse sources, enriches them with context, and surfaces meaningful insights through real-time search, alerts, dashboards, and automated workflows.

It also helps organizations reduce alert fatigue, control costs, and respond to incidents more effectively. With built-in analytics, AI-assisted workflows, contextual risk scoring, and centralized visibility, security, IT operations, and compliance teams can focus on real risks, streamline investigations, and fulfill audit requirements without expensive add-ons or surprise charges.

Jeff Darrington

Jeff Darrington is Graylog's Director, Technical Marketing. He is a long-time Graylog OS user with extensive experience in IT Operations, IT product solutions deployment in Firewalls, Networking, VOIP, Physical security Controls, and many others.

View More Posts By Jeff Darrington

Categories

Get the Monthly Tech Blog Roundup

Subscribe to the latest in log management, security, and all things Graylog blog delivered to your inbox once a month.