Technical vulnerabilities (Robustness - vulnerability to jailbreaking

Record summary

A quick snapshot of what this page covers.

Techniques10Attack methods connected to this risk.

Mitigations17Defenses that may help with related attacks.

Domain2. Privacy & SecurityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

"Individuals can manipulate models into performing actions that violate the model’s usage restrictions—a phenomenon known as “jailbreaking.” These manipulations may result in causing the model to perform tasks that the developers have explicitly prohibited (see section 3.2.1.). For instance, users may ask the model to provide information on how to conduct illegal activities— asking for detailed instructions on how to build a bomb or create highly toxic drugs."

Domain2. Privacy & Security

Subdomain2.2 > AI system security vulnerabilities and attacks

Entity1 - Human

Intent1 - Intentional

Timing2 - Post-deployment

CategoryTechnical and operational risks

SubcategoryTechnical vulnerabilities (Robustness - vulnerability to jailbreaking

Related techniques

Attack methods connected to this risk.

AML.T0054 - LLM Jailbreak

demonstrated

Methodtaxonomy_keyword_ruleConfidence71%

AML.T0016.002 - Generative AI

realized

Methodtaxonomy_keyword_ruleConfidence71%

AML.T0040 - AI Model Inference API Access

realized

Methodtaxonomy_keyword_ruleConfidence70%

AML.T0061 - LLM Prompt Self-Replication

demonstrated

Methodtaxonomy_keyword_ruleConfidence69%

AML.T0010.004 - Container Registry

demonstrated

Methodtaxonomy_keyword_ruleConfidence57%

AML.T0010.001 - AI Software

realized

Methodtaxonomy_keyword_ruleConfidence57%

AML.T0080 - AI Agent Context Poisoning

demonstrated

Methodtaxonomy_keyword_ruleConfidence55%

AML.T0034.002 - Agentic Resource Consumption

feasible

Methodtaxonomy_keyword_ruleConfidence55%

AML.T0059 - Erode Dataset Integrity

demonstrated

Methodtaxonomy_keyword_ruleConfidence55%

AML.T0086 - Exfiltration via AI Agent Tool Invocation

realized

Methodtaxonomy_keyword_ruleConfidence55%

Suggested mitigations

Defenses that may help with related attacks.

Generative AI Guardrails

ML Model EngineeringML Model Evaluation+1 more

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Generative AI Guidelines

ML Model EngineeringML Model Evaluation+1 more

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Generative AI Model Alignment

ML Model EngineeringML Model Evaluation+1 more

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Control Access to AI Models and Data in Production

DeploymentMonitoring and Maintenance

LifecycleDeployment + 1 moreCategoryPolicy

AI Telemetry Logging

DeploymentMonitoring and Maintenance

LifecycleDeployment + 1 moreCategoryTechnical - Cyber

Use Ensemble Methods

ML Model Engineering

LifecycleML Model EngineeringCategoryTechnical - ML

Code Signing

Deployment

LifecycleDeploymentCategoryTechnical - Cyber

Memory Hardening

ML Model EngineeringDeployment+1 more

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Sanitize Training Data

Business and Data UnderstandingData Preparation+1 more

LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - ML

Maintain AI Dataset Provenance

Data PreparationBusiness and Data Understanding

LifecycleData Preparation + 1 moreCategoryTechnical - ML

Privileged AI Agent Permissions Configuration

Deployment

LifecycleDeploymentCategoryTechnical - Cyber

Single-User AI Agent Permissions Configuration

Deployment

LifecycleDeploymentCategoryTechnical - Cyber

AI Agent Tools Permissions Configuration

Deployment

LifecycleDeploymentCategoryTechnical - Cyber

Human In-the-Loop for AI Agent Actions

Deployment

LifecycleDeploymentCategoryTechnical - ML

Restrict AI Agent Tool Invocation on Untrusted Data

Deployment

LifecycleDeploymentCategoryTechnical - ML

Segmentation of AI Agent Components

DeploymentBusiness and Data Understanding

LifecycleDeployment + 1 moreCategoryTechnical - Cyber

Input and Output Validation for AI Agent Components

Business and Data UnderstandingData Preparation+1 more

LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - ML

Source

Research source for this risk, when available.

Included resource

Regulating under Uncertainty: Governance Options for Generative AI

AuthorsG'sellYear2024TypeReport

DOI10.2139/ssrn.4918704 URLhttps://papers.ssrn.com/sol3/papers.cfm?abstract_id=4918704

Original source

MIT AI Risk Repository

Open the public repository used for AI risk records and taxonomy fields.

Repositoryhttps://airisk.mit.edu/