AI Risk

Jailbreak of a multimodal model

"Current generation multimodal (e.g., vision and language) GPAI models are vulnerable to adversarial jailbreak attacks. These attacks can be used to automatically induce a model to produce an arbitrary or specific output with high success rate [227]. Multimodal jailbreaks can also be used to exfiltrate a model’s context window or other model internals [18]."

View related techniques Read profile

AI Risk2. Privacy & Security2.2 > AI system security vulnerabilities and attacks2 - Post-deployment

Record summary

A quick snapshot of what this page covers.

Techniques24Attack methods connected to this risk.

Mitigations18Defenses that may help with related attacks.

Domain2. Privacy & SecurityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

Domain2. Privacy & Security

Subdomain2.2 > AI system security vulnerabilities and attacks

Entity1 - Human

Intent1 - Intentional

Timing2 - Post-deployment

CategoryAttacks on GPAIs/GPAI Failure Modes

SubcategoryJailbreak of a multimodal model

Related techniques

Attack methods connected to this risk.

AML.T0054 - LLM Jailbreak

demonstrated

Methodtaxonomy_keyword_ruleConfidence72%

AML.T0061 - LLM Prompt Self-Replication

demonstrated

Methodtaxonomy_keyword_ruleConfidence72%

AML.T0040 - AI Model Inference API Access

realized

Methodtaxonomy_keyword_ruleConfidence71%

AML.T0016.002 - Generative AI

realized

Methodtaxonomy_keyword_ruleConfidence69%

AML.T0010.005 - AI Agent Tool

realized

Methodtaxonomy_keyword_ruleConfidence67%

AML.T0110 - AI Agent Tool Poisoning

realized

Methodtaxonomy_keyword_ruleConfidence67%

AML.T0035 - AI Artifact Collection

realized

Methodtaxonomy_keyword_ruleConfidence67%

AML.T0048.004 - AI Intellectual Property Theft

realized

Methodtaxonomy_keyword_ruleConfidence67%

AML.T0008.005 - AI Service Proxies

realized

Methodtaxonomy_keyword_ruleConfidence67%

AML.T0037 - Data from Local System

realized

Methodtaxonomy_keyword_ruleConfidence67%

AML.T0017 - Develop Capabilities

realized

Methodtaxonomy_keyword_ruleConfidence67%

AML.T0007 - Discover AI Artifacts

demonstrated

Methodtaxonomy_keyword_ruleConfidence67%

AML.T0018.002 - Embed Malware

realized

Methodtaxonomy_keyword_ruleConfidence67%

AML.T0086 - Exfiltration via AI Agent Tool Invocation

realized

Methodtaxonomy_keyword_ruleConfidence67%

AML.T0024 - Exfiltration via AI Inference API

realized

Methodtaxonomy_keyword_ruleConfidence67%

AML.T0025 - Exfiltration via Cyber Means

realized

Methodtaxonomy_keyword_ruleConfidence67%

AML.T0044 - Full AI Model Access

demonstrated

Methodtaxonomy_keyword_ruleConfidence67%

AML.T0077 - LLM Response Rendering

demonstrated

Methodtaxonomy_keyword_ruleConfidence67%

AML.T0112.000 - Local AI Agent

demonstrated

Methodtaxonomy_keyword_ruleConfidence67%

AML.T0112 - Machine Compromise

demonstrated

Methodtaxonomy_keyword_ruleConfidence67%

AML.T0081 - Modify AI Agent Configuration

demonstrated

Methodtaxonomy_keyword_ruleConfidence67%

AML.T0084.001 - Tool Definitions

demonstrated

Methodtaxonomy_keyword_ruleConfidence67%

AML.T0064 - Gather RAG-Indexed Targets

demonstrated

Methodtaxonomy_keyword_ruleConfidence55%

AML.T0008.003 - Physical Countermeasures

demonstrated

Methodtext_similarity_sqliteConfidence53%

Suggested mitigations

Defenses that may help with related attacks.

Generative AI Guardrails

ML Model EngineeringML Model Evaluation+1 more

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Generative AI Guidelines

ML Model EngineeringML Model Evaluation+1 more

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Generative AI Model Alignment

ML Model EngineeringML Model Evaluation+1 more

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Control Access to AI Models and Data in Production

DeploymentMonitoring and Maintenance

LifecycleDeployment + 1 moreCategoryPolicy

AI Telemetry Logging

DeploymentMonitoring and Maintenance

LifecycleDeployment + 1 moreCategoryTechnical - Cyber

Limit Model Artifact Release

Business and Data UnderstandingDeployment

LifecycleBusiness and Data Understanding + 1 moreCategoryPolicy

Control Access to AI Models and Data at Rest

Business and Data UnderstandingData Preparation+2 more

LifecycleBusiness and Data Understanding + 3 moreCategoryPolicy

Encrypt Sensitive Information

Data PreparationML Model Engineering+1 more

LifecycleData Preparation + 2 moreCategoryTechnical - Cyber

AI Model Distribution Methods

Deployment

LifecycleDeploymentCategoryPolicy

Code Signing

Deployment

LifecycleDeploymentCategoryTechnical - Cyber

Privileged AI Agent Permissions Configuration

Deployment

LifecycleDeploymentCategoryTechnical - Cyber

Single-User AI Agent Permissions Configuration

Deployment

LifecycleDeploymentCategoryTechnical - Cyber

AI Agent Tools Permissions Configuration

Deployment

LifecycleDeploymentCategoryTechnical - Cyber

Human In-the-Loop for AI Agent Actions

Deployment

LifecycleDeploymentCategoryTechnical - ML

Restrict AI Agent Tool Invocation on Untrusted Data

Deployment

LifecycleDeploymentCategoryTechnical - ML

Segmentation of AI Agent Components

DeploymentBusiness and Data Understanding

LifecycleDeployment + 1 moreCategoryTechnical - Cyber

Input and Output Validation for AI Agent Components

Business and Data UnderstandingData Preparation+1 more

LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - ML

Restrict Number of AI Model Queries

Business and Data UnderstandingDeployment+1 more

LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - Cyber

Source

Research source for this risk, when available.

Included resource

Risk Sources and Risk Management Measures in Support of Standards for General-Purpose AI Systems

AuthorsGipiškis et al.Year2024TypeJournal Article

DOIhttps://doi.org/10.48550/arXiv.2410.23472 URLhttps://arxiv.org/abs/2410.23472

Original source

MIT AI Risk Repository

Open the public repository used for AI risk records and taxonomy fields.

Repositoryhttps://airisk.mit.edu/