APromptRiskDBThreat intelligence atlas
AI Security Technique

Data - AI Security Technique

Data is a key vector of supply chain compromise for adversaries. Every AI project will require some form of data. Many rely on large open source datasets that are publicly available. An adversary could rely on compromising these sources of data. The malicious data could be a result of Poison Training Data or include traditional malware. An adversary can also target private datasets in the...

AI Security Techniquerealized

Record summary

A quick snapshot of what this page covers.

Tactics0Attacker goals connected to this method.
Mitigations4Defenses that may help against this attack.
AI risks9Research-backed risks connected to this topic.

Attack context

How this AI attack works in practice.

Data is a key vector of supply chain compromise for adversaries. Every AI project will require some form of data. Many rely on large open source datasets that are publicly available. An adversary could rely on compromising these sources of data. The malicious data could be a result of Poison Training Data or include traditional malware.

An adversary can also target private datasets in the labeling phase. The creation of private datasets will often require the hiring of outside labeling services. An adversary can poison a dataset by modifying the labels being generated by the labeling service.

ATLAS ID
AML.T0010.002
Priority score
107
Maturity: realized

Mitigations

Defenses that may help against this attack.

AML.M0025 - Maintain AI Dataset Provenance

Data PreparationBusiness and Data Understanding
LifecycleData Preparation + 1 moreCategoryTechnical - ML

Dataset provenance can protect against supply chain compromise of data.

AML.M0007 - Sanitize Training Data

Business and Data UnderstandingData Preparation+1 more
LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - ML

Detect and remove or remediate poisoned data to avoid adversarial model drift or backdoor attacks.

AML.M0014 - Verify AI Artifacts

Business and Data UnderstandingData Preparation+1 more
LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - Cyber

Introduce proper checking of signatures to ensure that unsafe AI data will not be introduced to the system.

Case studies

Examples from public reports and exercises.

VirusTotal Poisoning

incident
Date2020-01-01

McAfee Advanced Threat Research noticed an increase in reports of a certain ransomware family that was out of the ordinary. Case investigation revealed that many samples of that particular ransomware family were submitted through a popular virus-sharing platform within a short amount of time. Further investigation revealed that based on string similarity the samples were all equivalent, and based on code similarity they were between 98 and 74 percent similar. Interestingly enough, the compile time was the same for all the samples. After more digging, researchers discovered that someone used 'metame' a metamorphic code manipulating tool to manipulate the original file towards mutant variants. The variants would not always be executable, but are still classified as the same ransomware family.

Tay Poisoning

incident
Date2016-03-23

Microsoft created Tay, a Twitter chatbot designed to engage and entertain users. While previous chatbots used pre-programmed scripts to respond to prompts, Tay's machine learning capabilities allowed it to be directly influenced by its conversations.

A coordinated attack encouraged malicious users to tweet abusive and offensive language at Tay, which eventually led to Tay generating similarly inflammatory content towards other users.

Microsoft decommissioned Tay within 24 hours of its launch and issued a public apology with lessons learned from the bot's failure.

Source

Where this page information comes from.