Record summary
A quick snapshot of what this page covers.
Attack context
How this AI attack works in practice.
Data is a key vector of supply chain compromise for adversaries. Every AI project will require some form of data. Many rely on large open source datasets that are publicly available. An adversary could rely on compromising these sources of data. The malicious data could be a result of Poison Training Data or include traditional malware.
An adversary can also target private datasets in the labeling phase. The creation of private datasets will often require the hiring of outside labeling services. An adversary can poison a dataset by modifying the labels being generated by the labeling service.
- ATLAS ID
- AML.T0010.002
- Priority score
- 107
Mitigations
Defenses that may help against this attack.
AML.M0005 - Control Access to AI Models and Data at Rest
Access controls can prevent tampering with ML artifacts and prevent unauthorized copying.
AML.M0025 - Maintain AI Dataset Provenance
Dataset provenance can protect against supply chain compromise of data.
AML.M0007 - Sanitize Training Data
Detect and remove or remediate poisoned data to avoid adversarial model drift or backdoor attacks.
AML.M0014 - Verify AI Artifacts
Introduce proper checking of signatures to ensure that unsafe AI data will not be introduced to the system.
Case studies
Examples from public reports and exercises.
VirusTotal Poisoning
McAfee Advanced Threat Research noticed an increase in reports of a certain ransomware family that was out of the ordinary. Case investigation revealed that many samples of that particular ransomware family were submitted through a popular virus-sharing platform within a short amount of time. Further investigation revealed that based on string similarity the samples were all equivalent, and based on code similarity they were between 98 and 74 percent similar. Interestingly enough, the compile time was the same for all the samples. After more digging, researchers discovered that someone used 'metame' a metamorphic code manipulating tool to manipulate the original file towards mutant variants. The variants would not always be executable, but are still classified as the same ransomware family.
Tay Poisoning
Microsoft created Tay, a Twitter chatbot designed to engage and entertain users. While previous chatbots used pre-programmed scripts to respond to prompts, Tay's machine learning capabilities allowed it to be directly influenced by its conversations.
A coordinated attack encouraged malicious users to tweet abusive and offensive language at Tay, which eventually led to Tay generating similarly inflammatory content towards other users.
Microsoft decommissioned Tay within 24 hours of its launch and issued a public apology with lessons learned from the bot's failure.
Source
Where this page information comes from.
Original source
Original source links
Open the public records and source datasets used for this page.