APromptRiskDBThreat intelligence atlas
AI Risk

Jailbreak of a multimodal model

"Current generation multimodal (e.g., vision and language) GPAI models are vulnerable to adversarial jailbreak attacks. These attacks can be used to automatically induce a model to produce an arbitrary or specific output with high success rate [227]. Multimodal jailbreaks can also be used to exfiltrate a model’s context window or other model internals [18]."

AI Risk2. Privacy & Security2.2 > AI system security vulnerabilities and attacks2 - Post-deployment

Record summary

A quick snapshot of what this page covers.

Techniques24Attack methods connected to this risk.
Mitigations18Defenses that may help with related attacks.
Domain2. Privacy & SecurityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

Domain2. Privacy & Security
Subdomain2.2 > AI system security vulnerabilities and attacks
Entity1 - Human
Intent1 - Intentional
Timing2 - Post-deployment
CategoryAttacks on GPAIs/GPAI Failure Modes
SubcategoryJailbreak of a multimodal model

Suggested mitigations

Defenses that may help with related attacks.

Generative AI Guardrails

ML Model EngineeringML Model Evaluation+1 more
LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Generative AI Guidelines

ML Model EngineeringML Model Evaluation+1 more
LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

AI Telemetry Logging

DeploymentMonitoring and Maintenance
LifecycleDeployment + 1 moreCategoryTechnical - Cyber

Code Signing

Deployment
LifecycleDeploymentCategoryTechnical - Cyber

Source

Research source for this risk, when available.