APromptRiskDBThreat intelligence atlas
AI Security Technique

Poison AI Model - AI Security Technique

Adversaries may manipulate an AI model's weights to change it's behavior or performance, resulting in a poisoned model. Adversaries may poison a model by directly manipulating its weights, training the model on poisoned data, further fine-tuning the model, or otherwise interfering with its training process. The change in behavior of poisoned models may be limited to targeted categories in predictive AI models, or...

AI Security Techniquedemonstrated

Record summary

A quick snapshot of what this page covers.

Tactics0Attacker goals connected to this method.
Mitigations5Defenses that may help against this attack.
AI risks12Research-backed risks connected to this topic.

Attack context

How this AI attack works in practice.

Adversaries may manipulate an AI model's weights to change it's behavior or performance, resulting in a poisoned model. Adversaries may poison a model by directly manipulating its weights, training the model on poisoned data, further fine-tuning the model, or otherwise interfering with its training process.

The change in behavior of poisoned models may be limited to targeted categories in predictive AI models, or targeted topics, concepts, or facts in generative AI models, or aim for a general performance degradation.

ATLAS ID
AML.T0018.000
Priority score
125
Maturity: demonstrated

Mitigations

Defenses that may help against this attack.

AML.M0013 - Code Signing

Deployment
LifecycleDeploymentCategoryTechnical - Cyber

Code signing provides a guarantee that the model has not been manipulated after signing took place.

AML.M0007 - Sanitize Training Data

Business and Data UnderstandingData Preparation+1 more
LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - ML

Prevent attackers from leveraging poisoned datasets to launch backdoor attacks against a model.

AML.M0008 - Validate AI Model

ML Model EvaluationMonitoring and Maintenance
LifecycleML Model Evaluation + 1 moreCategoryTechnical - ML

Ensure that trained models do not respond to potential backdoor triggers or adversarial influence.

Case studies

Examples from public reports and exercises.

AI Model Tampering via Supply Chain Attack

exercise
Date2023-09-26

Researchers at Trend Micro, Inc. used service indexing portals and web searching tools to identify over 8,000 misconfigured private container registries exposed on the internet. Approximately 70% of the registries also had overly permissive access controls that allowed write access. In their analysis, the researchers found over 1,000 unique AI models embedded in private container images within these open registries that could be pulled without authentication.

This exposure could allow adversaries to download, inspect, and modify container contents, including sensitive AI model files. This is an exposure of valuable intellectual property which could be stolen by an adversary. Compromised images could also be pushed to the registry, leading to a supply chain attack, allowing malicious actors to compromise the integrity of AI models used in production systems.

Organization Confusion on Hugging Face

exercise
Date2023-08-23

threlfall_hax, a security researcher, created organization accounts on Hugging Face, a public model repository, that impersonated real organizations. These false Hugging Face organization accounts looked legitimate so individuals from the impersonated organizations requested to join, believing the accounts to be an official site for employees to share models. This gave the researcher full access to any AI models uploaded by the employees, including the ability to replace models with malicious versions. The researcher demonstrated that they could embed malware into an AI model that provided them access to the victim organization's environment. From there, threat actors could execute a range of damaging attacks such as intellectual property theft or poisoning other AI models within the victim's environment.

PoisonGPT

exercise
Date2023-07-01

Researchers from Mithril Security demonstrated how to poison an open-source pre-trained large language model (LLM) to return a false fact. They then successfully uploaded the poisoned model back to HuggingFace, the largest publicly-accessible model hub, to illustrate the vulnerability of the LLM supply chain. Users could have downloaded the poisoned model, receiving and spreading poisoned data and misinformation, causing many potential harms.

Source

Where this page information comes from.