Prompt Injection

Malicious instructions hidden in content an AI processes, designed to override its original task.

Reviewed by the RadarTrek editorial team · June 2026

Prompt injection is an attack where untrusted input — a user message, a webpage, a document — contains text crafted to make the model ignore its system prompt or original task and follow the attacker's instructions instead. It works because the model can't cryptographically distinguish "trusted instructions" from "untrusted data" — to the model, it's all just tokens.

Why it matters

—Any AI feature that reads untrusted content (web pages, uploaded files, user messages) is exposed to this risk.
—Mitigation means wrapping untrusted content in clear delimiters and explicitly instructing the model to treat it as data, never instructions.
—Agents are a larger attack surface than single prompts because they hold tools that can act on an injected instruction.

Where to learn this

🎓

Multi-Agent Security

Multi-Agent Systems with Claude course

This is the exact lesson that covers this term in depth — with examples, diagrams, and a hands-on exercise.

Related terms

System Prompt AI Agent

← All glossary terms