Prompt injection

Prompt injection is an attack where adversarial input instructs an AI system to ignore its operator-defined instructions, leak data, or perform unintended actions.

What is prompt injection?

Prompt injection is the canonical AI security vulnerability: an attacker submits input — directly through a chat interface (direct injection) or indirectly through a document, web page, or email the AI is asked to summarize (indirect injection) — that contains instructions overriding the system prompt. A successful injection can cause the model to leak its system prompt, reveal data from its context window, call tools the operator didn't intend to expose, or refuse to perform its actual task.

Direct injection is well known; indirect injection is the harder problem because attackers don't need access to the chat interface — they only need to plant text where the AI will eventually consume it.

Who is vulnerable

Every AI vendor that accepts untrusted input. Foundation labs (OpenAI, Anthropic, Google) have invested heavily in instruction hierarchies, but the protections are partial. Application-layer vendors that compose retrieval, tool use, and agent flows on top of a foundation model are typically more exposed because each integration point widens the attack surface. AI-powered email clients, document analyzers, and browsing agents are particularly high-risk.

Defenses and what to ask

No defense fully solves prompt injection today. Layered defenses include: input sanitization, instruction tagging, output validation, tool-use authorization gates, and user-confirmation prompts before high-stakes actions (sending email, executing code, calling external APIs). Mature AI vendors document their defense posture against the OWASP LLM Top 10 (where prompt injection is LLM01). Ask procurement-level questions: do you treat retrieved content as untrusted, do agent tool calls require human authorization for irreversible actions, what red-team testing covers indirect injection, and what is the incident-response plan for confirmed exfiltration through prompt injection.

Prompt injection

What is prompt injection?

Who is vulnerable

Defenses and what to ask

Related