PromptRiskDBThreat intelligence atlas
AI Risk

Backdoors or trojan attacks in GPAI models

"Backdoors can be inserted into GPAI models during their training or fine-tuning, to be exploited during deployment [185, 118]. Attackers inserting the backdoor can be the GPAI model provider themselves or another actor (e.g., by ma- nipulating the training data or the software infrastructure used by the model provider) [222]. Some backdoors can be exploited with minimal overhead, al- lowing attackers to control t...

AI Risk2. Privacy & Security2.2 > AI system security vulnerabilities and attacks1 - Pre-deployment

Record summary

A quick snapshot of what this page covers.

Techniques2Attack methods connected to this risk.
Mitigations6Defenses that may help with related attacks.
Domain2. Privacy & SecurityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

"Backdoors can be inserted into GPAI models during their training or fine-tuning, to be exploited during deployment [185, 118]. Attackers inserting the backdoor can be the GPAI model provider themselves or another actor (e.g., by ma- nipulating the training data or the software infrastructure used by the model provider) [222]. Some backdoors can be exploited with minimal overhead, al- lowing attackers to control the model outputs in a targeted way with a high success rate [90]."

Domain2. Privacy & Security
Subdomain2.2 > AI system security vulnerabilities and attacks
Entity1 - Human
Intent1 - Intentional
Timing1 - Pre-deployment
CategoryAttacks on GPAIs/GPAI Failure Modes
SubcategoryBackdoors or trojan attacks in GPAI models

Suggested mitigations

Defenses that may help with related attacks.

Sanitize Training Data

Business and Data UnderstandingData Preparation+1 more
LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - ML

Validate AI Model

ML Model EvaluationMonitoring and Maintenance
LifecycleML Model Evaluation + 1 moreCategoryTechnical - ML

Code Signing

Deployment
LifecycleDeploymentCategoryTechnical - Cyber

Source

Research source for this risk, when available.