Prompt Injection
Malicious instructions hidden in content an AI processes, designed to override its original task.
Reviewed by the RadarTrek editorial team · June 2026
Prompt injection is an attack where untrusted input — a user message, a webpage, a document — contains text crafted to make the model ignore its system prompt or original task and follow the attacker's instructions instead. It works because the model can't cryptographically distinguish "trusted instructions" from "untrusted data" — to the model, it's all just tokens.
Why it matters
- —Any AI feature that reads untrusted content (web pages, uploaded files, user messages) is exposed to this risk.
- —Mitigation means wrapping untrusted content in clear delimiters and explicitly instructing the model to treat it as data, never instructions.
- —Agents are a larger attack surface than single prompts because they hold tools that can act on an injected instruction.
Where to learn this
Multi-Agent Security
Multi-Agent Systems with Claude course
This is the exact lesson that covers this term in depth — with examples, diagrams, and a hands-on exercise.