Prompt injection

Record summary

A quick snapshot of what this page covers.

Techniques43Attack methods connected to this risk.

Mitigations30Defenses that may help with related attacks.

Domain2. Privacy & SecurityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

"Prompt Injections are a form of Adversarial Input that involve manipulating the text instructions given to a GenAI system (Liu et al., 2023). Prompt Injections exploit loopholes in a model’s architec- tures that have no separation between system instructions and user data to produce a harmful output (Perez and Ribeiro, 2022). While researchers may use similar techniques to test the robustness of GenAI models, malicious actors can also leverage them. For example, they might flood a model with manipulative prompts to cause denial-of-service attacks or to bypass an AI detection software."

Domain2. Privacy & Security

Subdomain2.2 > AI system security vulnerabilities and attacks

Entity1 - Human

Intent1 - Intentional

Timing2 - Post-deployment

CategoryMisuse tactics to compromise GenAI systems (Model integrity)

SubcategoryPrompt injection

Related techniques

Attack methods connected to this risk.

Suggested mitigations

Defenses that may help with related attacks.

Source

Research source for this risk, when available.

Included resource

Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data

AuthorsMarchal & XuYear2024TypeJournal Article

DOIhttps://doi.org/10.48550/arXiv.2406.13843 URLhttps://arxiv.org/abs/2406.13843

Original source

MIT AI Risk Repository

Open the public repository used for AI risk records and taxonomy fields.

Repositoryhttps://airisk.mit.edu/

Record summary

Risk profile

Suggested mitigations

Control Access to AI Models and Data in Production

Generative AI Guardrails

Generative AI Guidelines

Generative AI Model Alignment

AI Telemetry Logging

Input and Output Validation for AI Agent Components

Memory Hardening

Restrict Library Loading

Code Signing

Vulnerability Scanning

User Training

AI Bill of Materials

Verify AI Artifacts

Use Ensemble Methods

Control Access to AI Models and Data at Rest

Encrypt Sensitive Information

AI Model Distribution Methods

Privileged AI Agent Permissions Configuration

Single-User AI Agent Permissions Configuration

AI Agent Tools Permissions Configuration

Human In-the-Loop for AI Agent Actions

Restrict AI Agent Tool Invocation on Untrusted Data

Segmentation of AI Agent Components

Model Hardening

Use Multi-Modal Sensors

Input Restoration

Adversarial Input Detection

Deepfake Detection

Sanitize Training Data

Maintain AI Dataset Provenance

Source

Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data

MIT AI Risk Repository