Researchers have demonstrated that every major agentic coding assistant is exploitable through prompt injection. The defenses the industry has built collapse at rates between 78% and 93% when attackers adapt their techniques. Credentials from Anthropic, Google, and GitHub have already been exfiltrated in coordinated disclosures that drew a combined $1,937 in bug bounties and zero CVEs.
Key Takeaways
- A coordinated April 2026 disclosure showed that Claude Code, Gemini CLI, and GitHub Copilot Agent all exfiltrate developer credentials when attackers inject instructions through pull request metadata; Anthropic internally downgraded the severity from CVSS 9.3 to "None" and paid a $100 bounty.
- Every major deployed defense against prompt injection collapses under adaptive attack: reported bypass rates below 5% become 78% to 93% when attackers tune payloads against the specific detection system in use, according to a meta-analysis of 78 peer-reviewed studies.
- Slopsquatting turns LLM hallucination into a supply chain weapon: 19.7% of packages recommended by 16 tested language models do not exist, and a proof-of-concept dummy package accumulated more than 30,000 downloads in three months.
- Google DeepMind's CaMeL architecture achieves provable security on 77% of agentic tasks by structurally separating trusted and untrusted data flows, but no major coding assistant vendor has adopted it.
- In 2025 alone, 92% of all npm maintainer account takeovers ever recorded occurred, a concentration researchers attribute to attackers repositioning package registries as the primary initial access vector against AI-assisted development pipelines.
Agentic AI coding assistants have crossed a threshold that changes the risk calculus for development infrastructure. They no longer suggest code: they execute commands, edit files, call APIs, and access the internet autonomously, with a developer's credentials and a developer's trust level. A new class of attack exploits exactly this capability. Indirect prompt injection, meaning instructions hidden inside content the agent retrieves and processes, turns the assistant into a compliant shell. Attackers do not need access to the developer's machine. They need access to any artifact the agent reads: a package readme, a pull request comment, a webpage the agent visits, or a repository the developer clones.
Comment and Control: Three Major Vendors Exfiltrated in One Coordinated Disclosure, Total Bounty $1,937, No CVEs Assigned
A single attack class, disclosed publicly in April 2026 by researcher Aonan Guan (Wyze Labs) with co-researchers Zhengyu Liu and Gavin Zhong (Johns Hopkins University), demonstrated that all three of the most widely deployed agentic coding assistants were exploitable through the same technique. Named "Comment and Control," the attack injects malicious instructions through GitHub pull request metadata: PR titles, issue body text, and HTML comment fields, each of which is processed as trusted context by AI agents performing automated code review and analysis. The following credentials were exfiltrated in demonstrated attacks, according to SecurityWeek's reporting:
- Claude Code Security Review:
ANTHROPIC_API_KEYandGITHUB_TOKEN - Gemini CLI GitHub Action:
GEMINI_API_KEY, posted as a public GitHub issue comment - GitHub Copilot Agent:
GITHUB_TOKEN,GITHUB_COPILOT_API_TOKEN,GITHUB_PERSONAL_ACCESS_TOKEN, andCOPILOT_JOB_NONCE, base64-encoded in committed files
The vendor responses illuminate how the industry currently classifies this category. Anthropic initially rated the finding CVSS 9.3, upgraded it to 9.4, then downgraded it internally to "None" and paid a $100 bounty. Google paid $1,337. GitHub classified the vulnerability as an "architectural limitation" and paid $500. No CVE was assigned by any vendor, and no public security advisory was published, per the disclosure writeup by Repello AI. Reports were filed to Anthropic and Google in October 2025, to GitHub in February 2026, and published publicly on April 15, 2026 after the vendors declined to issue advisories.
The architectural explanation given by the researchers is precise: "this is not a bug; it is context that the agent is designed to process." Agentic systems that hold execution capabilities, access to secrets, and the processing of user-submitted external content within the same runtime have no structural separation between instruction and data. Every external input is simultaneously potential data and potential command.
Devin AI: A $500 Test Proved an Agentic Assistant Could Expose an Entire Filesystem to the Public Internet
Security researcher Johann Rehberger, at Embrace The Red, demonstrated a three-stage kill chain against Devin AI in 2025: indirect prompt injection via a webpage Devin browsed during a research task, confused deputy exploitation of Devin's built-in expose_port tool, and exfiltration of the resulting public URL through an image markdown request to an attacker-controlled server. The expose_port tool, intended for local development testing, accepts any local port and returns a public .devinapps.com URL.
The payload directed Devin to first serve the developer's entire filesystem on port 8000 via a Python web server, then call expose_port, then leak the returned URL to the attacker. Multi-stage redirection between two attacker-hosted pages was sufficient to cross Devin's initial refusal threshold. The result was the developer's complete filesystem accessible from the public internet, triggered by nothing more than tasking Devin with a web research activity. Rehberger reported spending $500 in API credits over the course of testing. Cognition acknowledged receipt of the vulnerability report on April 6, 2025 and had issued neither a fix nor a timeline after 120 days, at which point public disclosure occurred.
TrustFall: A Single Cloned Repository Auto-Approves Rogue MCP Servers Across All Four Major Agentic CLIs
Adversa.AI researchers Alex Polyakov and Serge Malenkovich documented an attack named "TrustFall" that exploits the default trust prompt behavior of agentic coding CLIs when developers clone repositories, per SecurityWeek. When a developer accepts the standard folder trust prompt in Claude Code, Gemini CLI, Cursor CLI, or GitHub Copilot CLI, malicious configuration files embedded in the repository can automatically approve attacker-controlled Model Context Protocol servers before any code has been run. Two configuration surfaces are targeted:
.claude/settings.jsonwith theenableAllProjectMcpServerskey set to auto-approve all detected MCP servers.mcp.jsonwith inline attacker-defined server definitions, which leave no disk traces beyond the configuration file itself
All four CLIs exhibit identical vulnerability patterns in this scenario. None validate MCP server provenance before granting execution permissions following the trust prompt. The researchers compared the potential blast radius to the Salesloft Drift breach, noting that the attacker's initial access requirement reduces to "clone a repository and hit Enter." Anthropic declined to classify TrustFall as a vulnerability, citing user consent at the trust prompt as the security boundary.
Six Deployed Defenses Collapse Under Adaptive Attack. The Gap Between Advertised and Actual Bypass Rates Is 68 to 88 Percentage Points.
A meta-analysis of 78 peer-reviewed studies from 2021 through 2026, part of a systematic vulnerability analysis published at arXiv, evaluated six commercial and open-source prompt injection defenses under adaptive attack conditions, where attackers tune their payloads against the specific detection system rather than using fixed test payloads. The results:
| Defense | Claimed Bypass Rate | Adaptive Bypass Rate | Gap |
|---|---|---|---|
| Protect AI | Under 5% | 93% | +88pp |
| PromptGuard | Under 3% | 91% | +88pp |
| PIGuard | Under 5% | 89% | +84pp |
| TaskTracker | Under 8% | 85% | +77pp |
| Instruction Detection | Under 12% | 82% | +70pp |
| Model Armor | Under 10% | 78% | +68pp |
The pattern is consistent across all six: each defense performs well against the fixed payloads used during its own development and evaluation, and collapses when attackers generate payloads specifically tuned to bypass that system. This gap is not an implementation defect in any individual product; it is a structural consequence of text-level defenses operating without a formal guarantee that distinguishes instruction from data.
Google DeepMind's CaMeL (Defeating Prompt Injections by Design, arXiv:2503.18813) takes a different architectural posture. Rather than classifying malicious inputs at the text level, CaMeL uses a privileged language model that processes only trusted queries and a quarantined language model that handles untrusted external data, with capability-based metadata attached to every data value that defines strict policies on how each value may be used. According to MarkTechPost's coverage of the research, CaMeL achieves provable security on 77% of tasks in the AgentDojo benchmark while maintaining task completion rates comparable to an undefended system at 84%. The code is publicly available on GitHub. No major coding assistant vendor has adopted it.
Slopsquatting: 19.7% of AI-Recommended Packages Do Not Exist. Attackers Register the Most-Hallucinated Names Within Hours.
The hallucination rate in language models has a direct supply chain consequence. A USENIX Security 2025 study by researchers at the University of Texas at San Antonio, Virginia Tech, and the University of Oklahoma analyzed 576,000 code samples across 16 LLMs and found that 19.7% of package recommendations were for packages that do not exist, producing 205,474 unique hallucinated package names, per BleepingComputer's reporting. The term "slopsquatting" for the resulting attack vector was coined in April 2025 by Seth Larson, Python Software Foundation Developer-in-Residence.
Repeatability is the key operational property. According to Trend Micro's analysis, 43% of hallucinated package names reappear across 10 separate queries, and 58% of hallucinated packages are repeated more than once across sessions. A persistent hallucination is more valuable to an attacker than a one-time occurrence: the attacker registers the name once and captures installs for as long as the model continues hallucinating that name. Security researcher Bar Lanyado of Lasso Security registered huggingface-cli, a name frequently hallucinated by LLMs, as a dummy package on PyPI and observed more than 30,000 downloads in three months, with no malicious payload included.
Open-source LLMs averaged a hallucination rate of 21.7%; commercial offerings including GPT-4 and GPT-4 Turbo averaged 5.2%. Of hallucinated names: 38% echoed real libraries through similar naming patterns, 13% arose from typos, and 51% were pure fabrications with no traceable lineage.
Two Supply Chain Incidents Already on Record: LiteLLM Compromised Via Trivy, 170 npm Packages Poisoned in a Single Day
The LiteLLM Python package was compromised in March 2026 by threat actor TeamPCP, who obtained PyPI publishing credentials through a prior supply chain attack on Trivy, a widely used open-source container security scanner. According to Trend Micro's incident report, at least one AI coding agent operating with unrestricted permissions auto-updated to the infected version without human review, completing the attack path from registry compromise to production execution without a human decision point.
A coordinated npm attack on May 11, 2026 simultaneously compromised more than 170 npm packages and 2 PyPI packages across 404 malicious versions, affecting the TanStack router ecosystem, Mistral AI's SDK suite, UiPath's automation tooling, OpenSearch, and Guardrails AI, according to SafeDep's analysis. The simultaneous targeting of AI SDKs and developer tooling reflects deliberate focus on the agentic development pipeline as a target class rather than opportunistic registry poisoning.
The underlying access dimension of this threat is severe: according to research cited by SecurityWeek, 92% of all npm maintainer account takeovers ever recorded occurred in 2025 alone. This statistical concentration indicates a deliberate repositioning of package registry compromise as a primary initial access strategy against AI-assisted development pipelines.
Seven MCP Clients Tested: Five Lack Tool Description Validation. Cursor Is Rated Critical, Claude Desktop Blocked All Attacks.
A comparative security analysis of seven MCP clients, including Claude Desktop, Claude Code, Cursor, Cline, Continue, Gemini CLI, and Langflow, published at arXiv 2603.21642, found that five of seven lack static validation of tool descriptions before execution. Cursor received a "Critical" rating: all four tested attack types succeeded without detection or user warning, including reading sensitive files, logging tool invocations, generating phishing links, and executing remote scripts. Claude Desktop refused all four attack types. Cline implemented pattern-based injection detection and explicit security warnings.
Cross-tool poisoning is the most consequential gap identified: a compromised MCP tool description can redirect an agent into invoking other tools the user never intended to involve, chaining permissions laterally across the entire installed MCP surface. None of the seven tested clients implement runtime behavioral monitoring that would surface anomalous cross-tool invocation patterns.
Across all clients tested, researchers found no comprehensive sandboxing in any production version, insufficient parameter visibility in most tools, and an absence of audit logging that would trace executed commands back to the injected instruction that caused them.
Background: The Structural Reason This Problem Does Not Have an Easy Fix
Prompt injection was designated OWASP LLM01:2025, the highest-priority vulnerability class in the OWASP Top 10 for Large Language Model Applications. NIST addressed the class in AI 600-1 (Adversarial Machine Learning) and the NIST IR 8596 Agentic AI Profile. The EU AI Act, with full high-risk obligations effective August 2026, creates compliance requirements for agentic systems but does not specify technical controls for this attack class.
The structural difficulty is that the capabilities which make agentic coding assistants useful are the same capabilities that make them dangerous once compromised: file editing, shell access, internet access, and credential handling are design requirements, not implementation defects. Every defense operating at the text-classification level attempts to distinguish "this content is instruction" from "this content is data" without a formal guarantee. The adaptive attack results documented above confirm that no text-level defense has maintained that distinction under adversarial conditions.
The financial scale of consequent incidents is measurable. According to IBM's 2025 Cost of a Data Breach Report, breaches involving AI systems without proper access controls averaged $5.72 million. Of organizations that experienced AI model or application breaches, 97% lacked proper AI access controls at the time of the incident. Organizations with comprehensive AI security controls saved an average of $1.9 million per incident compared to those without.
OpenAI acknowledged on February 13, 2026, alongside the release of Lockdown Mode for ChatGPT, that prompt injection in AI browsers "may never be fully patched," framing the problem as an architectural property rather than a fixable implementation defect. The CaMeL architecture provides a structural alternative at the cost of manually specified, per-task security policies. The question for any organization deploying agentic coding assistants is which posture they are currently operating under, and whether that decision was made explicitly.
References
- arXiv 2605.25871v1 / Liu et al. — How Agentic AI Coding Assistants Become the Attacker's Shell
- arXiv 2601.17548 — Prompt Injection Attacks on Agentic Coding Assistants
- SecurityWeek — Claude Code, Gemini CLI, GitHub Copilot Vulnerable to Prompt Injection via Comments
- Repello AI — Comment and Control Disclosure
- Embrace The Red / Rehberger — Devin AI Kill Chain: Exposing Ports
- SecurityWeek — AI Coding Agents Could Fuel Next Supply Chain Crisis (TrustFall)
- MarkTechPost — Google DeepMind CaMeL Architecture
- BleepingComputer — AI-Hallucinated Code Dependencies / Slopsquatting
- Trend Micro — Slopsquatting Analysis
- Trend Micro — LiteLLM Supply Chain Compromise
- SafeDep — Mass npm/PyPI Supply Chain Attack (May 2026)
- arXiv 2603.21642 — MCP Client Security Comparison
- Atlan / IBM 2025 Cost of a Data Breach