Transferable adversarial attacks from open to closed-source mod- els

Record summary

A quick snapshot of what this page covers.

Techniques6Attack methods connected to this risk.

Mitigations15Defenses that may help with related attacks.

Domain2. Privacy & SecurityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

Domain2. Privacy & Security

Subdomain2.2 > AI system security vulnerabilities and attacks

Entity1 - Human

Intent1 - Intentional

Timing2 - Post-deployment

CategoryAttacks on GPAIs/GPAI Failure Modes

SubcategoryTransferable adversarial attacks from open to closed-source mod- els

Related techniques

Attack methods connected to this risk.

AML.T0044 - Full AI Model Access

demonstrated

Methodtaxonomy_keyword_ruleConfidence62%

AML.T0010.002 - Data

realized

Methodtaxonomy_keyword_ruleConfidence57%

AML.T0008.002 - Domains

demonstrated

Methodtaxonomy_keyword_ruleConfidence55%

AML.T0015 - Evade AI Model

realized

Methodtaxonomy_keyword_ruleConfidence55%

AML.T0054 - LLM Jailbreak

demonstrated

Methodtaxonomy_keyword_ruleConfidence55%

AML.T0058 - Publish Poisoned Models

realized

Methodtaxonomy_keyword_ruleConfidence55%

Suggested mitigations

Defenses that may help with related attacks.

Control Access to AI Models and Data at Rest

Business and Data UnderstandingData Preparation+2 more

LifecycleBusiness and Data Understanding + 3 moreCategoryPolicy

AI Model Distribution Methods

Deployment

LifecycleDeploymentCategoryPolicy

Sanitize Training Data

Business and Data UnderstandingData Preparation+1 more

LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - ML

Verify AI Artifacts

Business and Data UnderstandingData Preparation+1 more

LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - Cyber

Maintain AI Dataset Provenance

Data PreparationBusiness and Data Understanding

LifecycleData Preparation + 1 moreCategoryTechnical - ML

Model Hardening

Data PreparationML Model Engineering

LifecycleData Preparation + 1 moreCategoryTechnical - ML

Use Ensemble Methods

ML Model Engineering

LifecycleML Model EngineeringCategoryTechnical - ML

Use Multi-Modal Sensors

Business and Data UnderstandingData Preparation+1 more

LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - Cyber

Input Restoration

Data PreparationML Model Evaluation+2 more

LifecycleData Preparation + 3 moreCategoryTechnical - ML

Adversarial Input Detection

Data PreparationML Model Engineering+3 more

LifecycleData Preparation + 4 moreCategoryTechnical - ML

Deepfake Detection

DeploymentMonitoring and Maintenance+2 more

LifecycleDeployment + 3 moreCategoryTechnical - ML

Generative AI Guardrails

ML Model EngineeringML Model Evaluation+1 more

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Generative AI Guidelines

ML Model EngineeringML Model Evaluation+1 more

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Generative AI Model Alignment

ML Model EngineeringML Model Evaluation+1 more

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

AI Bill of Materials

Business and Data UnderstandingData Preparation+1 more

LifecycleBusiness and Data Understanding + 2 moreCategoryPolicy

Source

Research source for this risk, when available.

Included resource

Risk Sources and Risk Management Measures in Support of Standards for General-Purpose AI Systems

AuthorsGipiškis et al.Year2024TypeJournal Article

DOIhttps://doi.org/10.48550/arXiv.2410.23472 URLhttps://arxiv.org/abs/2410.23472

Original source

MIT AI Risk Repository

Open the public repository used for AI risk records and taxonomy fields.

Repositoryhttps://airisk.mit.edu/