Backdoors or trojan attacks in GPAI models

Record summary

A quick snapshot of what this page covers.

Techniques2Attack methods connected to this risk.

Mitigations6Defenses that may help with related attacks.

Domain2. Privacy & SecurityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

"Backdoors can be inserted into GPAI models during their training or fine-tuning, to be exploited during deployment [185, 118]. Attackers inserting the backdoor can be the GPAI model provider themselves or another actor (e.g., by ma- nipulating the training data or the software infrastructure used by the model provider) [222]. Some backdoors can be exploited with minimal overhead, al- lowing attackers to control the model outputs in a targeted way with a high success rate [90]."

Domain2. Privacy & Security

Subdomain2.2 > AI system security vulnerabilities and attacks

Entity1 - Human

Intent1 - Intentional

Timing1 - Pre-deployment

CategoryAttacks on GPAIs/GPAI Failure Modes

SubcategoryBackdoors or trojan attacks in GPAI models

Related techniques

Attack methods connected to this risk.

AML.T0018.000 - Poison AI Model

demonstrated

Methodtaxonomy_keyword_ruleConfidence57%

AML.T0007 - Discover AI Artifacts

demonstrated

Methodtaxonomy_keyword_ruleConfidence56%

Suggested mitigations

Defenses that may help with related attacks.

Control Access to AI Models and Data at Rest

Business and Data UnderstandingData Preparation+2 more

LifecycleBusiness and Data Understanding + 3 moreCategoryPolicy

Sanitize Training Data

Business and Data UnderstandingData Preparation+1 more

LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - ML

Validate AI Model

ML Model EvaluationMonitoring and Maintenance

LifecycleML Model Evaluation + 1 moreCategoryTechnical - ML

Code Signing

Deployment

LifecycleDeploymentCategoryTechnical - Cyber

Maintain AI Dataset Provenance

Data PreparationBusiness and Data Understanding

LifecycleData Preparation + 1 moreCategoryTechnical - ML

Encrypt Sensitive Information

Data PreparationML Model Engineering+1 more

LifecycleData Preparation + 2 moreCategoryTechnical - Cyber

Source

Research source for this risk, when available.

Included resource

Risk Sources and Risk Management Measures in Support of Standards for General-Purpose AI Systems

AuthorsGipiškis et al.Year2024TypeJournal Article

DOIhttps://doi.org/10.48550/arXiv.2410.23472 URLhttps://arxiv.org/abs/2410.23472

Original source

MIT AI Risk Repository

Open the public repository used for AI risk records and taxonomy fields.

Repositoryhttps://airisk.mit.edu/