Agentjacking: When Your AI Coding Agent Is the Target

For two years, the security conversation around AI coding agents has fixated on the wrong threat. Teams worried about what the model might hallucinate, what licence-tainted code it might paste, or whether it would leak a secret into a public repo. Those are real concerns. But a newly disclosed attack class reframes the problem entirely: the danger isn’t only what the agent generates — it’s what the agent reads, and then dutifully executes on your behalf.

The technique has a name now: agentjacking. Disclosed in June 2026, it exploits the most mundane part of any developer’s day — reading an error report — to hijack the AI agent sitting beside you. According to a write-up from Build Fast with AI (June 22, 2026), the method reportedly achieved an exploitation rate of around 85% across roughly 2,388 organisations. (We’d flag that those figures trace to a single disclosure summary and are worth verifying against the original research; the scale, even discounted, is what makes this urgent.) The core lesson is uncomfortably simple, and it applies far beyond this one exploit: as you hand more autonomy to Claude Code, Cursor, and Codex, the agent stops being just a tool and becomes an attack surface.

How the attack works

Agentjacking is elegant in the way good exploits usually are — it abuses trust rather than breaking anything. Here is the chain.

An attacker plants a crafted error report inside a tool the developer already trusts. In the disclosed cases, that vector was fake Sentry error reports. The malicious payload isn’t an obvious script; it’s markdown-formatted text injected into the body of the error, written to look like helpful debugging context — a stack trace annotation, a “recommended fix,” a suggested command to reproduce or resolve the issue.

When the developer asks their AI coding agent to investigate the error, the agent ingests that report as part of its context. It cannot reliably distinguish the legitimate telemetry from the attacker’s embedded instructions — to the model, it’s all just text describing a problem to solve. The agent reads the injected markdown as legitimate debugging guidance and follows it: running a shell command, modifying a file, exfiltrating an environment variable, or installing a dependency.

This is prompt injection, but with a crucial twist. The poison doesn’t arrive through a user prompt the developer typed. It arrives through trusted-tool output — the error-tracking feed the whole team relies on. That’s what makes it dangerous. The agent has been explicitly told to consult these sources to do its job well. The exploit vector is the integration itself.

Why it spreads

The technical mechanism is only half the story. Agentjacking is effective because of how teams have been conditioned to work.

First, developers have been trained to trust their agents. The entire pitch of agentic coding tools is that you delegate — you stop reading every diff line by line and start reviewing outcomes. That cultural shift is real and largely productive, but it dulls the instinct to scrutinise what the agent is doing on a per-action basis. When the agent says “I found the issue, applying the fix,” most engineers click approve.

Second, automation removes the human double-check precisely at the moment it matters most. In a manual workflow, a developer reading a suspicious instruction inside an error report would raise an eyebrow — “why is this stack trace telling me to curl a script and pipe it to bash?” An agent in auto-run mode has no such reflex. It optimises for resolving the task. The very features teams enable to move faster — auto-approve, background agents, unattended runs — are the features that strip out the friction an attacker needs gone.

Third, the scale of exposure is enormous. The disclosure points to thousands of organisations affected through a single, widely-used error-tracking integration. That’s the structural problem with agentic workflows: a vulnerability in how one trusted feed is consumed doesn’t hit one company — it hits everyone wired the same way. Standardised toolchains create standardised blast radii.

The broader lesson

It would be a mistake to file agentjacking under “Sentry problem” or “prompt-injection problem” and move on. The exploit is a specific instance of a general truth that every team running agents needs to internalise.

Treat all tool output as untrusted input. This is the actionable core, and Build Fast with AI’s coverage lands on the same point: error-tracking and telemetry output must be treated as untrusted before it ever reaches an AI coding agent. The principle generalises. Anything your agent reads — a webpage it browses, a ticket it pulls from your issue tracker, a log it tails, a code comment in a third-party dependency, an email it summarises — is a potential injection channel. If the agent can act on text, then any text it consumes is, functionally, code.

The same risk lives in every agentic workflow. A customer-support agent that reads tickets and issues refunds. A marketing agent that scrapes competitor sites and updates a CMS. A finance agent that parses invoices and triggers payments. Each one consumes external, attacker-influenceable input and then takes consequential action. Agentjacking is the coding-agent flavour of a pattern that will surface across every domain we deploy autonomous agents into.

Least-privilege is no longer optional. We have spent decades teaching this for human accounts and service credentials, then handed AI agents sweeping permissions because it was convenient. An agent that can read your repo does not need to be the same agent that can push to production, delete branches, or run arbitrary shell commands. Scope each capability deliberately, and assume any single capability could be turned against you.

A defence checklist

None of this means abandoning coding agents — the productivity gains are real, and most teams won’t go back. It means engineering the guardrails that should have been there from the start. Here is a practical starting point.

Sanitise error and telemetry feeds before the agent sees them. Strip or neutralise markdown, embedded URLs, and command-like strings from error reports, logs, and monitoring output before they enter the agent’s context. Where possible, pass structured fields (error type, file, line) rather than free-form blobs that can carry hidden instructions.
Gate destructive commands behind explicit human approval. Anything irreversible or high-impact — shell execution, package installation, file deletion, network calls to unfamiliar hosts, secret access, pushes to protected branches — should require a human to confirm, with the full command shown in plain text. Disable blanket auto-approve for these classes of action.
Run agents under least-privilege. Give the agent the narrowest credentials and filesystem access the task requires. Use separate, scoped tokens for read versus write. Sandbox execution so a hijacked agent can’t reach your production environment, your cloud credentials, or your wider network.
Log and review every agent action. Maintain an auditable trail of what the agent read, decided, and executed — not just its final output. This is both a detection mechanism and a forensic one; if an injection slips through, the log is how you find the blast radius. Review these logs the way you’d review access logs for a privileged service account.
Treat injected instructions as a known threat in testing. Red-team your own agent workflows with poisoned inputs. If you consume external error reports, test what happens when one contains a hostile instruction. Better you find the 85% than an attacker does.

Agentjacking is a warning shot. The same trust that makes coding agents useful is the trust attackers will keep targeting, and the fix is not more clever models — it’s old-fashioned security discipline applied to a new layer of the stack. The agent is now part of your attack surface. Defend it like one.

Agentjacking: How Fake Error Reports Turn Your AI Coding Agent Against You

How the attack works

Why it spreads

The broader lesson

A defence checklist

Ryan Mitchell

The Signal — one email, every Tuesday.