arxiv:2601.17548

Prompt Injection Attacks on Agentic Coding Assistants: A Systematic Analysis of Vulnerabilities in Skills, Tools, and Protocol Ecosystems

Published on Jan 24

Authors:

Abstract

Agentic AI coding assistants face significant security risks from prompt injection attacks, with defense mechanisms showing limited effectiveness against adaptive attack strategies.

AI-generated summary

The proliferation of agentic AI coding assistants, including Claude Code, GitHub Copilot, Cursor, and emerging skill-based architectures, has fundamentally transformed software development workflows. These systems leverage Large Language Models (LLMs) integrated with external tools, file systems, and shell access through protocols like the Model Context Protocol (MCP). However, this expanded capability surface introduces critical security vulnerabilities. In this Systematization of Knowledge (SoK) paper, we present a comprehensive analysis of prompt injection attacks targeting agentic coding assistants. We propose a novel three-dimensional taxonomy categorizing attacks across delivery vectors, attack modalities, and propagation behaviors. Our meta-analysis synthesizes findings from 78 recent studies (2021--2026), consolidating evidence that attack success rates against state-of-the-art defenses exceed 85\% when adaptive attack strategies are employed. We systematically catalog 42 distinct attack techniques spanning input manipulation, tool poisoning, protocol exploitation, multimodal injection, and cross-origin context poisoning. Through critical analysis of 18 defense mechanisms reported in prior work, we identify that most achieve less than 50\% mitigation against sophisticated adaptive attacks. We contribute: (1) a unified taxonomy bridging disparate attack classifications, (2) the first systematic analysis of skill-based architecture vulnerabilities with concrete exploit chains, and (3) a defense-in-depth framework grounded in the limitations we identify. Our findings indicate that the security community must treat prompt injection as a first-class vulnerability class requiring architectural-level mitigations rather than ad-hoc filtering approaches.

View arXiv page View PDF Add to collection

Community

armorerlabs

10 days ago

This paper maps very closely to what we are seeing in practice: the dangerous surface is not just the chat prompt, it is the whole coding-agent runtime: skills, shell/file access, MCP/tool protocols, retrieved repo context, and persistent workspace state.

We are building Armorer Guard around that runtime boundary. It is a local Rust scanner that can sit before tool calls, outbound sends, logs, memory writes, or agent handoffs and return structured risk scores for prompt injection, sensitive-data requests, exfiltration, safety-bypass, destructive-command, and system-prompt extraction.

Demo: https://huggingface.co/spaces/armorer-labs/armorer-guard-demo
Repo: https://github.com/ArmorerLabs/Armorer-Guard

The part of the taxonomy I would love to see standardized is the enforcement point: not only whether an input is malicious, but whether a given agent action should be blocked, redacted, escalated, or allowed with policy context.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2601.17548

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.17548 in a model README.md to link it from this page.

Datasets citing this paper 2

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.17548 in a Space README.md to link it from this page.