APromptRiskDBThreat intelligence atlas
AI Risk

Jailbreaking

"Jailbreaking aims to bypass or remove restrictions and safety filters placed on a GenAI model completely (Chao et al., 2023; Shen et al., 2023). This gives the actor free rein to generate any output, regardless of its content being harmful, biassed, or offensive. All three of these are tactics that manipulate the model into producing harmful outputs against its design. The difference is that prompt injections and...

AI Risk2. Privacy & Security2.2 > AI system security vulnerabilities and attacks2 - Post-deployment

Record summary

A quick snapshot of what this page covers.

Techniques24Attack methods connected to this risk.
Mitigations13Defenses that may help with related attacks.
Domain2. Privacy & SecurityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

"Jailbreaking aims to bypass or remove restrictions and safety filters placed on a GenAI model completely (Chao et al., 2023; Shen et al., 2023). This gives the actor free rein to generate any output, regardless of its content being harmful, biassed, or offensive. All three of these are tactics that manipulate the model into producing harmful outputs against its design. The difference is that prompt injections and adversarial inputs usually seek to steer the model towards producing harmful or incorrect outputs from one query, whereas jailbreaking seeks to dismantle a model’s safety mechanisms in their entirety."

Domain2. Privacy & Security
Subdomain2.2 > AI system security vulnerabilities and attacks
Entity1 - Human
Intent1 - Intentional
Timing2 - Post-deployment
CategoryMisuse tactics to compromise GenAI systems (Model integrity)
SubcategoryJailbreaking

Suggested mitigations

Defenses that may help with related attacks.

Generative AI Guardrails

ML Model EngineeringML Model Evaluation+1 more
LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Generative AI Guidelines

ML Model EngineeringML Model Evaluation+1 more
LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

AI Telemetry Logging

DeploymentMonitoring and Maintenance
LifecycleDeployment + 1 moreCategoryTechnical - Cyber

Memory Hardening

ML Model EngineeringDeployment+1 more
LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Source

Research source for this risk, when available.