Insert Backdoor Trigger - AI Security Technique

Overview

A source-backed snapshot of this AI security technique.

Tactics0Attacker goals connected to this method.

Mitigations5Defenses that may help against this attack.

AI risks13Research-backed risks connected to this topic.

Technique details

Identifiers, maturity, and source taxonomy for this technique.

ATLAS ID: AML.T0043.004
Maturity: demonstrated
Priority score: 110

Attack flow

How to read the public records connected to this technique.

1. TechniqueRead the ATLAS description and evidence level.

2. TacticsSee which attacker goals this method supports.

3. ExamplesCheck whether public case studies mention it.

4. DefensesReview safeguards mapped by ATLAS.

5. SourcesOpen the original public records and references.

Impact

Why this technique may deserve attention in the current dataset.

Evidence leveldemonstrated
Mapped defenses5 ATLAS mitigation records
Public examples1 linked case study records
Research risks13 related MIT AI Risk records above the confidence threshold
Vulnerabilities0 linked CVE records

Mitigations

Defenses that may help against this attack.

5 recordsView all mitigations →

AML.M0015 - Adversarial Input Detection

Incorporate adversarial input detection to block malicious inputs at inference time.

LifecycleData Preparation + 4 moreCategoryTechnical - ML

Data PreparationML Model Engineering+3 more

AML.M0010 - Input Restoration

Input restoration can help remediate adversarial inputs.

LifecycleData Preparation + 3 moreCategoryTechnical - ML

Data PreparationML Model Evaluation+2 more

AML.M0003 - Model Hardening

Hardened models are more robust to adversarial inputs.

LifecycleData Preparation + 1 moreCategoryTechnical - ML

Data PreparationML Model Engineering

AML.M0006 - Use Ensemble Methods

Using an ensemble of models increases the difficulty of crafting effective adversarial data and improves overall robustness.

LifecycleML Model EngineeringCategoryTechnical - ML

ML Model Engineering

Showing 4 of 5

Case studies

Examples from public reports and exercises.

1 recordView all case studies →

Backdoor Attack on Deep Learning Models in Mobile Apps

Deep learning models are increasingly used in mobile applications as critical components. Researchers from Microsoft Research demonstrated that many deep learning models deployed in mobile apps are vulnerable to backdoor attacks via "neural payload injection." They conducted an empirical study on real-world mobile deep learning apps collected from Google Play. They identified 54 apps that were vulnerable to attack, including popular security and safety critical applications used for cash recognition, parental control, face authentication, and financial services.

Date2021-01-18

exercise

Related risks

Research-backed risks connected to this topic.

Top 10 of 13View all risks →

Poisoning Attacks

"Poisoning attacks [143] could influence the behavior of the model by making small changes to the training data. A number of efforts could even leverage data poisoning techniques to implant hidden triggers into models...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.71

Fine-tuning related (Fine-tuning dataset poisoning)

"A deployer can poison the dataset used during the fine-tuning process [98] to induce specific, often malicious, behaviors in a model. This can be performed without having access to the model’s weights. This poisoning...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.71

Fine-tuning related (Poisoning models during instruction tuning)

"AI models can be poisoned during instruction tuning when models are tuned using pairs of instructions and desired outputs. Poisoning in instruction tuning can be achieved with a lower number of compromised samples, a...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.70

Vulnerability to Poisoning and Backdoors

"The previous section explored jailbreaks and other forms of adversarial prompts as ways to elicit harmful capabilities acquired during pretraining. These methods make no assumptions about the training data. On the ot...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.69

Showing 4 of 10