Record summary
A quick snapshot of what this page covers.
Risk profile
How this risk is described and categorized.
"Prompt injections represent another class of attacks that involve the malicious insertion of prompts or requests in LLM-based interactive systems, leading to unintended actions or disclosure of sensitive information. The prompt injection is somewhat related to the classic structured query language (SQL) injection attack in cybersecurity where the embedded command looks like a regular input at the start but has a malicious impact. The injected prompt can deceive the application into executing the unauthorized code, exploit the vulnerabilities, and compromise security in its entirety. More recently, security researchers have demonstrated the use of indirect prompt injections. These attacks on AI systems enable adversaries to remotely (without a direct interface) exploit LLM-integrated applications by strategically injecting prompts into data likely to be retrieved. Proof-of-concept exploits of this nature have demonstrated that they can lead to the full compromise of a model at inference time analogous to traditional security principles. This can entail remote control of the model, persistent compromise, theft of data, and denial of service. As advanced AI assistants are likely to be integrated into broader software ecosystems through third-party plugins and extensions, with access to the internet and possibly operating systems, the severity and consequences of prompt injection attacks will likely escalate and necessitate proper mitigation mechanisms."
Suggested mitigations
Defenses that may help with related attacks.
Control Access to AI Models and Data in Production
Generative AI Guardrails
Generative AI Guidelines
Generative AI Model Alignment
AI Telemetry Logging
Input and Output Validation for AI Agent Components
Memory Hardening
Privileged AI Agent Permissions Configuration
Single-User AI Agent Permissions Configuration
AI Agent Tools Permissions Configuration
Human In-the-Loop for AI Agent Actions
Restrict AI Agent Tool Invocation on Untrusted Data
Segmentation of AI Agent Components
Control Access to AI Models and Data at Rest
AI Model Distribution Methods
Model Hardening
Use Ensemble Methods
Input Restoration
Adversarial Input Detection
Restrict Library Loading
Code Signing
Verify AI Artifacts
Vulnerability Scanning
User Training
AI Bill of Materials
Use Multi-Modal Sensors
Deepfake Detection
Source
Research source for this risk, when available.
Included resource
The Ethics of Advanced AI Assistants
Original source
MIT AI Risk Repository
Open the public repository used for AI risk records and taxonomy fields.