Adversarial AI: Prompt Injections

Record summary

A quick snapshot of what this page covers.

Techniques38Attack methods connected to this risk.

Mitigations27Defenses that may help with related attacks.

Domain2. Privacy & SecurityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

"Prompt injections represent another class of attacks that involve the malicious insertion of prompts or requests in LLM-based interactive systems, leading to unintended actions or disclosure of sensitive information. The prompt injection is somewhat related to the classic structured query language (SQL) injection attack in cybersecurity where the embedded command looks like a regular input at the start but has a malicious impact. The injected prompt can deceive the application into executing the unauthorized code, exploit the vulnerabilities, and compromise security in its entirety. More recently, security researchers have demonstrated the use of indirect prompt injections. These attacks on AI systems enable adversaries to remotely (without a direct interface) exploit LLM-integrated applications by strategically injecting prompts into data likely to be retrieved. Proof-of-concept exploits of this nature have demonstrated that they can lead to the full compromise of a model at inference time analogous to traditional security principles. This can entail remote control of the model, persistent compromise, theft of data, and denial of service. As advanced AI assistants are likely to be integrated into broader software ecosystems through third-party plugins and extensions, with access to the internet and possibly operating systems, the severity and consequences of prompt injection attacks will likely escalate and necessitate proper mitigation mechanisms."

Domain2. Privacy & Security

Subdomain2.2 > AI system security vulnerabilities and attacks

Entity3 - Other

Intent1 - Intentional

Timing2 - Post-deployment

CategoryMalicious Uses

SubcategoryAdversarial AI: Prompt Injections

Related techniques

Attack methods connected to this risk.

Suggested mitigations

Defenses that may help with related attacks.

Source

Research source for this risk, when available.

Included resource

The Ethics of Advanced AI Assistants

AuthorsGabriel et al.Year2024TypePreprint

DOI10.48550/arXiv.2404.16244 URLhttps://doi.org/10.48550/arXiv.2404.16244

Original source

MIT AI Risk Repository

Open the public repository used for AI risk records and taxonomy fields.

Repositoryhttps://airisk.mit.edu/

Adversarial AI: Prompt Injections

Record summary

Risk profile

Suggested mitigations

Control Access to AI Models and Data in Production

Generative AI Guardrails

Generative AI Guidelines

Generative AI Model Alignment

AI Telemetry Logging

Input and Output Validation for AI Agent Components

Memory Hardening

Privileged AI Agent Permissions Configuration

Single-User AI Agent Permissions Configuration

AI Agent Tools Permissions Configuration

Human In-the-Loop for AI Agent Actions

Restrict AI Agent Tool Invocation on Untrusted Data

Segmentation of AI Agent Components

Control Access to AI Models and Data at Rest

AI Model Distribution Methods

Model Hardening

Use Ensemble Methods

Input Restoration

Adversarial Input Detection

Restrict Library Loading

Code Signing

Verify AI Artifacts

Vulnerability Scanning

User Training

AI Bill of Materials

Use Multi-Modal Sensors

Deepfake Detection

Source

The Ethics of Advanced AI Assistants

MIT AI Risk Repository