Record summary
A quick snapshot of what this page covers.
Attack context
How this AI attack works in practice.
- ATLAS ID
- AML.T0031
- Priority score
- 92
Mitigations
Defenses that may help against this attack.
AML.M0015 - Adversarial Input Detection
Incorporate adversarial input detection into the pipeline before inputs reach the model.
AML.M0010 - Input Restoration
Preprocessing model inputs can prevent malicious data from going through the machine learning pipeline.
AML.M0003 - Model Hardening
Hardened models are less susceptible to integrity attacks.
AML.M0006 - Use Ensemble Methods
Using multiple different models increases robustness to attack.
Case studies
Examples from public reports and exercises.
Web-Scale Data Poisoning: Split-View Attack
Many recent large-scale datasets are distributed as a list of URLs pointing to individual datapoints. The researchers show that many of these datasets are vulnerable to a "split-view" poisoning attack. The attack exploits the fact that the data viewed when it was initially collected may differ from the data viewed by a user during training. The researchers identify expired and buyable domains that once hosted dataset content, making it possible to replace portions of the dataset with poisoned data. They demonstrate that for 10 popular web-scale datasets, enough of the domains are purchasable to successfully carry out a poisoning attack.
PoisonGPT
Researchers from Mithril Security demonstrated how to poison an open-source pre-trained large language model (LLM) to return a false fact. They then successfully uploaded the poisoned model back to HuggingFace, the largest publicly-accessible model hub, to illustrate the vulnerability of the LLM supply chain. Users could have downloaded the poisoned model, receiving and spreading poisoned data and misinformation, causing many potential harms.
Attack on Machine Translation Services
Machine translation services (such as Google Translate, Bing Translator, and Systran Translate) provide public-facing UIs and APIs. A research group at UC Berkeley utilized these public endpoints to create a replicated model with near-production state-of-the-art translation quality. Beyond demonstrating that IP can be functionally stolen from a black-box system, they used the replicated model to successfully transfer adversarial examples to the real production services. These adversarial inputs successfully cause targeted word flips, vulgar outputs, and dropped sentences on Google Translate and Systran Translate websites.
ClearviewAI Misconfiguration
Clearview AI makes a facial recognition tool that searches publicly available photos for matches. This tool has been used for investigative purposes by law enforcement agencies and other parties.
Clearview AI's source code repository, though password protected, was misconfigured to allow an arbitrary user to register an account. This allowed an external researcher to gain access to a private code repository that contained Clearview AI production credentials, keys to cloud storage buckets containing 70K video samples, and copies of its applications and Slack tokens. With access to training data, a bad actor has the ability to cause an arbitrary misclassification in the deployed model. These kinds of attacks illustrate that any attempt to secure ML system should be on top of "traditional" good cybersecurity hygiene such as locking down the system with least privileges, multi-factor authentication and monitoring and auditing.
Tay Poisoning
Microsoft created Tay, a Twitter chatbot designed to engage and entertain users. While previous chatbots used pre-programmed scripts to respond to prompts, Tay's machine learning capabilities allowed it to be directly influenced by its conversations.
A coordinated attack encouraged malicious users to tweet abusive and offensive language at Tay, which eventually led to Tay generating similarly inflammatory content towards other users.
Microsoft decommissioned Tay within 24 hours of its launch and issued a public apology with lessons learned from the bot's failure.
Source
Where this page information comes from.
Original source
Original source links
Open the public records and source datasets used for this page.