Record summary
A quick snapshot of what this page covers.
Attack context
How this AI attack works in practice.
Adversaries may manipulate an AI model's weights to change it's behavior or performance, resulting in a poisoned model. Adversaries may poison a model by directly manipulating its weights, training the model on poisoned data, further fine-tuning the model, or otherwise interfering with its training process.
The change in behavior of poisoned models may be limited to targeted categories in predictive AI models, or targeted topics, concepts, or facts in generative AI models, or aim for a general performance degradation.
- ATLAS ID
- AML.T0018.000
- Priority score
- 125
Mitigations
Defenses that may help against this attack.
AML.M0013 - Code Signing
Code signing provides a guarantee that the model has not been manipulated after signing took place.
AML.M0005 - Control Access to AI Models and Data at Rest
Access controls can prevent tampering with ML artifacts and prevent unauthorized copying.
AML.M0025 - Maintain AI Dataset Provenance
Dataset provenance can protect against poisoning of models.
AML.M0007 - Sanitize Training Data
Prevent attackers from leveraging poisoned datasets to launch backdoor attacks against a model.
AML.M0008 - Validate AI Model
Ensure that trained models do not respond to potential backdoor triggers or adversarial influence.
Case studies
Examples from public reports and exercises.
AI Model Tampering via Supply Chain Attack
Researchers at Trend Micro, Inc. used service indexing portals and web searching tools to identify over 8,000 misconfigured private container registries exposed on the internet. Approximately 70% of the registries also had overly permissive access controls that allowed write access. In their analysis, the researchers found over 1,000 unique AI models embedded in private container images within these open registries that could be pulled without authentication.
This exposure could allow adversaries to download, inspect, and modify container contents, including sensitive AI model files. This is an exposure of valuable intellectual property which could be stolen by an adversary. Compromised images could also be pushed to the registry, leading to a supply chain attack, allowing malicious actors to compromise the integrity of AI models used in production systems.
Organization Confusion on Hugging Face
threlfall_hax, a security researcher, created organization accounts on Hugging Face, a public model repository, that impersonated real organizations. These false Hugging Face organization accounts looked legitimate so individuals from the impersonated organizations requested to join, believing the accounts to be an official site for employees to share models. This gave the researcher full access to any AI models uploaded by the employees, including the ability to replace models with malicious versions. The researcher demonstrated that they could embed malware into an AI model that provided them access to the victim organization's environment. From there, threat actors could execute a range of damaging attacks such as intellectual property theft or poisoning other AI models within the victim's environment.
PoisonGPT
Researchers from Mithril Security demonstrated how to poison an open-source pre-trained large language model (LLM) to return a false fact. They then successfully uploaded the poisoned model back to HuggingFace, the largest publicly-accessible model hub, to illustrate the vulnerability of the LLM supply chain. Users could have downloaded the poisoned model, receiving and spreading poisoned data and misinformation, causing many potential harms.
Source
Where this page information comes from.
Original source
Original source links
Open the public records and source datasets used for this page.