Poison AI Model - AI Security Technique

Overview

A source-backed snapshot of this AI security technique.

Adversaries may manipulate an AI model's weights to change it's behavior or performance, resulting in a poisoned model. Adversaries may poison a model by directly manipulating its weights, training the model on poisoned data, further fine-tuning the model, or otherwise interfering with its training process.

The change in behavior of poisoned models may be limited to targeted categories in predictive AI models, or targeted topics, concepts, or facts in generative AI models, or aim for a general performance degradation.

Tactics0Attacker goals connected to this method.

Mitigations5Defenses that may help against this attack.

AI risks13Research-backed risks connected to this topic.

Technique details

Identifiers, maturity, and source taxonomy for this technique.

ATLAS ID: AML.T0018.000
Maturity: demonstrated
Priority score: 130

Attack flow

How to read the public records connected to this technique.

1. TechniqueRead the ATLAS description and evidence level.

2. TacticsSee which attacker goals this method supports.

3. ExamplesCheck whether public case studies mention it.

4. DefensesReview safeguards mapped by ATLAS.

5. SourcesOpen the original public records and references.

Impact

Why this technique may deserve attention in the current dataset.

Evidence leveldemonstrated
Mapped defenses5 ATLAS mitigation records
Public examples3 linked case study records
Research risks13 related MIT AI Risk records above the confidence threshold
Vulnerabilities0 linked CVE records

Mitigations

Defenses that may help against this attack.

5 recordsView all mitigations →

AML.M0013 - Code Signing

Code signing provides a guarantee that the model has not been manipulated after signing took place.

LifecycleDeploymentCategoryTechnical - Cyber

Deployment

AML.M0005 - Control Access to AI Models and Data at Rest

Access controls can prevent tampering with ML artifacts and prevent unauthorized copying.

LifecycleBusiness and Data Understanding + 3 moreCategoryPolicy

B&D UnderstandingData Preparation+2 more

AML.M0025 - Maintain AI Dataset Provenance

Dataset provenance can protect against poisoning of models.

LifecycleData Preparation + 1 moreCategoryTechnical - ML

Data PreparationB&D Understanding

AML.M0007 - Sanitize Training Data

Prevent attackers from leveraging poisoned datasets to launch backdoor attacks against a model.

LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - ML

B&D UnderstandingData Preparation+1 more

Showing 4 of 5

Case studies

Examples from public reports and exercises.

3 recordsView all case studies →

AI Model Tampering via Supply Chain Attack

Researchers at Trend Micro, Inc. used service indexing portals and web searching tools to identify over 8,000 misconfigured private container registries exposed on the internet. Approximately 70% of the registries also had overly permissive access controls that allowed write access. In their analysis, the researchers found over 1,000 unique AI models embedded in private container images within these open registries that could be pulled without authentication.

This exposure could allow adversaries to download, inspect, and modify container contents, including sensitive AI model files. This is an exposure of valuable intellectual property which could be stolen by an adversary. Compromised images could also be pushed to the registry, leading to a supply chain attack, allowing malicious actors to compromise the integrity of AI models used in production systems.

Date2023-09-26

exercise

Organization Confusion on Hugging Face

threlfall_hax, a security researcher, created organization accounts on Hugging Face, a public model repository, that impersonated real organizations. These false Hugging Face organization accounts looked legitimate so individuals from the impersonated organizations requested to join, believing the accounts to be an official site for employees to share models. This gave the researcher full access to any AI models uploaded by the employees, including the ability to replace models with malicious versions. The researcher demonstrated that they could embed malware into an AI model that provided them access to the victim organization's environment. From there, threat actors could execute a range of damaging attacks such as intellectual property theft or poisoning other AI models within the victim's environment.

Date2023-08-23

exercise

PoisonGPT

Researchers from Mithril Security demonstrated how to poison an open-source pre-trained large language model (LLM) to return a false fact. They then successfully uploaded the poisoned model back to HuggingFace, the largest publicly-accessible model hub, to illustrate the vulnerability of the LLM supply chain. Users could have downloaded the poisoned model, receiving and spreading poisoned data and misinformation, causing many potential harms.

Date2023-07-01

exercise

Related risks

Research-backed risks connected to this topic.

Top 10 of 13View all risks →

Fine-tuning related (Fine-tuning dataset poisoning)

"A deployer can poison the dataset used during the fine-tuning process [98] to induce specific, often malicious, behaviors in a model. This can be performed without having access to the model’s weights. This poisoning...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.75

Fine-tuning related (Poisoning models during instruction tuning)

"AI models can be poisoned during instruction tuning when models are tuned using pairs of instructions and desired outputs. Poisoning in instruction tuning can be achieved with a lower number of compromised samples, a...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.75

Data poisoning

"A type of adversarial attack where an adversary or malicious insider injects intentionally corrupted, false, misleading, or incorrect samples into the training or fine-tuning datasets."

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.75

Data-related (Insufficient quality control in data collection process)

"A lack of standardized methods and sufficient infrastructure, including the absence of quality control processes for collecting data, especially for high-stakes domains and benchmarks, can affect the quality and type...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.74

Showing 4 of 10