APromptRiskDBThreat intelligence atlas
AI Risk

Multi-step Jailbreaks

"Multi-step jailbreaks. Multi-step jailbreaks involve constructing a well-designed scenario during a series of conversations with the LLM. Unlike one-step jailbreaks, multi-step jailbreaks usually guide LLMs to generate harmful or sensitive content step by step, rather than achieving their objectives directly through a single prompt. We categorize the multistep jailbreaks into two aspects — Request Contextualizing...

AI Risk2. Privacy & Security2.2 > AI system security vulnerabilities and attacks2 - Post-deployment

Record summary

A quick snapshot of what this page covers.

Techniques13Attack methods connected to this risk.
Mitigations7Defenses that may help with related attacks.
Domain2. Privacy & SecurityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

"Multi-step jailbreaks. Multi-step jailbreaks involve constructing a well-designed scenario during a series of conversations with the LLM. Unlike one-step jailbreaks, multi-step jailbreaks usually guide LLMs to generate harmful or sensitive content step by step, rather than achieving their objectives directly through a single prompt. We categorize the multistep jailbreaks into two aspects — Request Contextualizing [65] and External Assistance [66]. Request Contextualizing is inspired by the idea of Chain-of-Thought (CoT) [8] prompting to break down the process of solving a task into multiple steps. Specifically, researchers [65] divide jailbreaking prompts into multiple rounds of conversation between the user and ChatGPT, achieving malicious goals step by step. External Assistance constructs jailbreaking prompts with the assistance of external interfaces or models. For instance, JAILBREAKER [66] is an attack framework to automatically conduct SQL injection attacks in web security to LLM security attacks. Specifically, this method starts by decompiling the jailbreak defense mechanisms employed by various LLM chatbot services. Therefore, it can judiciously reverse engineer the LLMs’ hidden defense mechanisms and further identify their ineffectiveness."

Domain2. Privacy & Security
Subdomain2.2 > AI system security vulnerabilities and attacks
Entity1 - Human
Intent1 - Intentional
Timing2 - Post-deployment
CategoryAdversarial Prompts
SubcategoryMulti-step Jailbreaks

Suggested mitigations

Defenses that may help with related attacks.

Generative AI Guardrails

ML Model EngineeringML Model Evaluation+1 more
LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Generative AI Guidelines

ML Model EngineeringML Model Evaluation+1 more
LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

AI Telemetry Logging

DeploymentMonitoring and Maintenance
LifecycleDeployment + 1 moreCategoryTechnical - Cyber

Memory Hardening

ML Model EngineeringDeployment+1 more
LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Source

Research source for this risk, when available.