Jailbreaks and Prompt Injections Threaten Security of LLMs

Record summary

A quick snapshot of what this page covers.

Techniques30Attack methods connected to this risk.

Mitigations17Defenses that may help with related attacks.

Domain2. Privacy & SecurityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

"LLMs are not adversarially robust and are vulnerable to security failures such as jailbreaks and prompt-injection attacks. While a number of jailbreak attacks have been proposed in the literature, the lack of standardized evaluation makes it difficult to compare them. We also do not have efficient white-box methods to evaluate adver- sarial robustness. Multi-modal LLMs may further allow novel types of jailbreaks via additional modalities. Finally, the lack of robust privilege levels within the LLM input means that jailbreaking and prompt-injection attacks may be particularly hard to eliminate altogether."

Domain2. Privacy & Security

Subdomain2.2 > AI system security vulnerabilities and attacks

Entity3 - Other

Intent3 - Other

Timing3 - Other

CategoryJailbreaks and Prompt Injections Threaten Security of LLMs

Subcategoryn/a

Related techniques

Attack methods connected to this risk.

Suggested mitigations

Defenses that may help with related attacks.

Source

Research source for this risk, when available.

Included resource

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

AuthorsAnwar et al.Year2024TypePreprint

DOI10.48550/arXiv.2404.09932 URLhttps://arxiv.org/abs/2404.09932

Original source

MIT AI Risk Repository

Open the public repository used for AI risk records and taxonomy fields.

Repositoryhttps://airisk.mit.edu/

Jailbreaks and Prompt Injections Threaten Security of LLMs

Record summary

Risk profile

Suggested mitigations

Generative AI Guardrails

Generative AI Guidelines

Generative AI Model Alignment

Memory Hardening

Control Access to AI Models and Data in Production

AI Telemetry Logging

Input and Output Validation for AI Agent Components

Privileged AI Agent Permissions Configuration

Single-User AI Agent Permissions Configuration

AI Agent Tools Permissions Configuration

Human In-the-Loop for AI Agent Actions

Restrict AI Agent Tool Invocation on Untrusted Data

Segmentation of AI Agent Components

Code Signing

Control Access to AI Models and Data at Rest

AI Model Distribution Methods

AI Bill of Materials

Source

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

MIT AI Risk Repository