Publish Poisoned Datasets - AI Security Technique

Overview

A source-backed snapshot of this AI security technique.

Tactics1Attacker goals connected to this method.

Mitigations3Defenses that may help against this attack.

AI risks11Research-backed risks connected to this topic.

Technique details

Identifiers, maturity, and source taxonomy for this technique.

ATLAS ID: AML.T0019
Maturity: demonstrated
Priority score: 94

ATLAS tactics

Resource Development

Attack flow

How to read the public records connected to this technique.

1. TechniqueRead the ATLAS description and evidence level.

2. TacticsSee which attacker goals this method supports.

3. ExamplesCheck whether public case studies mention it.

4. DefensesReview safeguards mapped by ATLAS.

5. SourcesOpen the original public records and references.

Impact

Why this technique may deserve attention in the current dataset.

Evidence leveldemonstrated
Mapped defenses3 ATLAS mitigation records
Public examples1 linked case study records
Research risks11 related MIT AI Risk records above the confidence threshold
Vulnerabilities0 linked CVE records

Mitigations

Defenses that may help against this attack.

3 recordsView all mitigations →

AML.M0023 - AI Bill of Materials

An AI BOM can help users identify untrustworthy model artifacts.

LifecycleBusiness and Data Understanding + 2 moreCategoryPolicy

B&D UnderstandingData Preparation+1 more

AML.M0025 - Maintain AI Dataset Provenance

Maintaining a detailed history of datasets can help identify use of poisoned datasets from public sources.

LifecycleData Preparation + 1 moreCategoryTechnical - ML

Data PreparationB&D Understanding

AML.M0014 - Verify AI Artifacts

Determine validity of published data in order to avoid using poisoned data that introduces vulnerabilities.

LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - Cyber

B&D UnderstandingData Preparation+1 more

Case studies

Examples from public reports and exercises.

1 recordView all case studies →

Web-Scale Data Poisoning: Split-View Attack

Many recent large-scale datasets are distributed as a list of URLs pointing to individual datapoints. The researchers show that many of these datasets are vulnerable to a "split-view" poisoning attack. The attack exploits the fact that the data viewed when it was initially collected may differ from the data viewed by a user during training. The researchers identify expired and buyable domains that once hosted dataset content, making it possible to replace portions of the dataset with poisoned data. They demonstrate that for 10 popular web-scale datasets, enough of the domains are purchasable to successfully carry out a poisoning attack.

Date2024-06-06

exercise

Related risks

Research-backed risks connected to this topic.

Top 10 of 11View all risks →

Adversarial AI (General)

"Adversarial AI refers to a class of attacks that exploit vulnerabilities in machine-learning (ML) models. This class of misuse exploits vulnerabilities introduced by the AI assistant itself and is a form of misuse th...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.73

Fine-tuning related (Poisoning models during instruction tuning)

"AI models can be poisoned during instruction tuning when models are tuned using pairs of instructions and desired outputs. Poisoning in instruction tuning can be achieved with a lower number of compromised samples, a...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.72

Jailbreak in LLM Malicious Use - Poisoning Training Data

"In the data collecting and pre-training phase, malicious adversaries can Jailbreak LLMs through poisoning their training data to make the model to output harmful content."

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.72

Data poisoning

"A type of adversarial attack where an adversary or malicious insider injects intentionally corrupted, false, misleading, or incorrect samples into the training or fine-tuning datasets."

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.72

Showing 4 of 10