APromptRiskDBThreat intelligence atlas
AI Security Technique

Publish Poisoned Models - AI Security Technique

Adversaries may publish a poisoned model to a public location such as a model registry or code repository. The poisoned model may be a novel model or a poisoned variant of an existing open-source model. This model may be introduced to a victim system via AI Supply Chain Compromise.

AI Security TechniquerealizedResource Development

Record summary

A quick snapshot of what this page covers.

Tactics1Attacker goals connected to this method.
Mitigations1Defenses that may help against this attack.
AI risks12Research-backed risks connected to this topic.

Attack context

How this AI attack works in practice.

ATLAS ID
AML.T0058
Priority score
123
Maturity: realized
Resource Development

Mitigations

Defenses that may help against this attack.

AML.M0023 - AI Bill of Materials

Business and Data UnderstandingData Preparation+1 more
LifecycleBusiness and Data Understanding + 2 moreCategoryPolicy

An AI BOM can help users identify untrustworthy model artifacts.

Case studies

Examples from public reports and exercises.

Malicious Models on Hugging Face

incident
Date2025-02-25

Researchers at ReversingLabs have identified malicious models containing embedded malware hosted on the Hugging Face model repository. The models were found to execute reverse shells when loaded, which grants the threat actor command and control capabilities on the victim's system. Hugging Face uses Picklescan to scan models for malicious code, however these models were not flagged as malicious. The researchers discovered that the model files were seemingly purposefully corrupted in a way that the malicious payload is executed before the model ultimately fails to de-serialize fully. Picklescan relied on being able to fully de-serialize the model.

Since becoming aware of this issue, Hugging Face has removed the models and has made changes to Picklescan to catch this particular attack. However, pickle files are fundamentally unsafe as they allow for arbitrary code execution, and there may be other types of malicious pickles that Picklescan cannot detect.

Organization Confusion on Hugging Face

exercise
Date2023-08-23

threlfall_hax, a security researcher, created organization accounts on Hugging Face, a public model repository, that impersonated real organizations. These false Hugging Face organization accounts looked legitimate so individuals from the impersonated organizations requested to join, believing the accounts to be an official site for employees to share models. This gave the researcher full access to any AI models uploaded by the employees, including the ability to replace models with malicious versions. The researcher demonstrated that they could embed malware into an AI model that provided them access to the victim organization's environment. From there, threat actors could execute a range of damaging attacks such as intellectual property theft or poisoning other AI models within the victim's environment.

PoisonGPT

exercise
Date2023-07-01

Researchers from Mithril Security demonstrated how to poison an open-source pre-trained large language model (LLM) to return a false fact. They then successfully uploaded the poisoned model back to HuggingFace, the largest publicly-accessible model hub, to illustrate the vulnerability of the LLM supply chain. Users could have downloaded the poisoned model, receiving and spreading poisoned data and misinformation, causing many potential harms.

Source

Where this page information comes from.