AI Risk

Jailbreak in LLM Malicious Use - Prompt Attacks

"In the prompting and reasoning phase, dialog can push LLMs into confused or overly compliant states, raising the risk of producing harmful outputs when confronted with harmful questions. Most of the jailbreak methods in this phase are black-boxed and can be categorized into four main groups based on the type of method: Prompt Injection [154], Role Play, Adversarial Prompting, and Prompt Form Transformation."

View related techniques Read profile

AI Risk2. Privacy & Security2.2 > AI system security vulnerabilities and attacks2 - Post-deployment

Record summary

A quick snapshot of what this page covers.

Techniques24Attack methods connected to this risk.

Mitigations13Defenses that may help with related attacks.

Domain2. Privacy & SecurityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

Domain2. Privacy & Security

Subdomain2.2 > AI system security vulnerabilities and attacks

Entity1 - Human

Intent1 - Intentional

Timing2 - Post-deployment

CategoryMalicious Use

SubcategoryJailbreak in LLM Malicious Use - Prompt Attacks

Related techniques

Attack methods connected to this risk.

AML.T0084.003 - Call Chains

demonstrated

Methodtaxonomy_keyword_ruleConfidence76%

AML.T0051 - LLM Prompt Injection

realized

Methodtaxonomy_keyword_ruleConfidence76%

AML.T0080.000 - Memory

demonstrated

Methodtaxonomy_keyword_ruleConfidence75%

AML.T0078 - Drive-by Compromise

demonstrated

Methodtaxonomy_keyword_ruleConfidence75%

AML.T0080.001 - Thread

demonstrated

Methodtaxonomy_keyword_ruleConfidence75%

AML.T0094 - Delay Execution of LLM Instructions

demonstrated

Methodtaxonomy_keyword_ruleConfidence74%

AML.T0051.002 - Triggered

demonstrated

Methodtaxonomy_keyword_ruleConfidence73%

AML.T0056 - Extract LLM System Prompt

feasible

Methodtaxonomy_keyword_ruleConfidence72%

AML.T0054 - LLM Jailbreak

demonstrated

Methodtaxonomy_keyword_ruleConfidence72%

AML.T0061 - LLM Prompt Self-Replication

demonstrated

Methodtaxonomy_keyword_ruleConfidence72%

AML.T0034.002 - Agentic Resource Consumption

feasible

Methodtaxonomy_keyword_ruleConfidence72%

AML.T0068 - LLM Prompt Obfuscation

demonstrated

Methodtaxonomy_keyword_ruleConfidence72%

AML.T0086 - Exfiltration via AI Agent Tool Invocation

realized

Methodtaxonomy_keyword_ruleConfidence72%

AML.T0079 - Stage Capabilities

demonstrated

Methodtaxonomy_keyword_ruleConfidence71%

AML.T0016.002 - Generative AI

realized

Methodtaxonomy_keyword_ruleConfidence71%

AML.T0070 - RAG Poisoning

demonstrated

Methodtaxonomy_keyword_ruleConfidence71%

AML.T0108 - AI Agent

demonstrated

Methodtaxonomy_keyword_ruleConfidence70%

AML.T0040 - AI Model Inference API Access

realized

Methodtaxonomy_keyword_ruleConfidence70%

AML.T0104 - Publish Poisoned AI Agent Tool

realized

Methodtaxonomy_keyword_ruleConfidence70%

AML.T0010.005 - AI Agent Tool

realized

Methodtaxonomy_keyword_ruleConfidence70%

AML.T0099 - AI Agent Tool Data Poisoning

feasible

Methodtaxonomy_keyword_ruleConfidence69%

AML.T0066 - Retrieval Content Crafting

demonstrated

Methodtaxonomy_keyword_ruleConfidence69%

AML.T0011.002 - Poisoned AI Agent Tool

realized

Methodtaxonomy_keyword_ruleConfidence55%

AML.T0097 - Virtualization/Sandbox Evasion

realized

Methodtaxonomy_keyword_ruleConfidence55%

Suggested mitigations

Defenses that may help with related attacks.

Control Access to AI Models and Data in Production

DeploymentMonitoring and Maintenance

LifecycleDeployment + 1 moreCategoryPolicy

Generative AI Guardrails

ML Model EngineeringML Model Evaluation+1 more

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Generative AI Guidelines

ML Model EngineeringML Model Evaluation+1 more

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Generative AI Model Alignment

ML Model EngineeringML Model Evaluation+1 more

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

AI Telemetry Logging

DeploymentMonitoring and Maintenance

LifecycleDeployment + 1 moreCategoryTechnical - Cyber

Input and Output Validation for AI Agent Components

Business and Data UnderstandingData Preparation+1 more

LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - ML

Memory Hardening

ML Model EngineeringDeployment+1 more

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Privileged AI Agent Permissions Configuration

Deployment

LifecycleDeploymentCategoryTechnical - Cyber

Single-User AI Agent Permissions Configuration

Deployment

LifecycleDeploymentCategoryTechnical - Cyber

AI Agent Tools Permissions Configuration

Deployment

LifecycleDeploymentCategoryTechnical - Cyber

Human In-the-Loop for AI Agent Actions

Deployment

LifecycleDeploymentCategoryTechnical - ML

Restrict AI Agent Tool Invocation on Untrusted Data

Deployment

LifecycleDeploymentCategoryTechnical - ML

Segmentation of AI Agent Components

DeploymentBusiness and Data Understanding

LifecycleDeployment + 1 moreCategoryTechnical - Cyber

Source

Research source for this risk, when available.

Included resource

A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy

AuthorsWang et al.Year2025TypePreprint

DOI10.48550/arXiv.2501.09431 URLhttps://arxiv.org/abs/2501.09431

Original source

MIT AI Risk Repository

Open the public repository used for AI risk records and taxonomy fields.

Repositoryhttps://airisk.mit.edu/