APromptRiskDBThreat intelligence atlas
AI Security Technique

Models - AI Security Technique

Adversaries may acquire public models to use in their operations. Adversaries may seek models used by the victim organization or models that are representative of those used by the victim organization. Representative models may include model architectures, or pre-trained models which define the architecture as well as model parameters from training on a dataset. The adversary may search public sources for common m...

AI Security Techniquedemonstrated

Record summary

A quick snapshot of what this page covers.

Tactics0Attacker goals connected to this method.
Mitigations2Defenses that may help against this attack.
AI risks0Research-backed risks connected to this topic.

Attack context

How this AI attack works in practice.

Adversaries may acquire public models to use in their operations. Adversaries may seek models used by the victim organization or models that are representative of those used by the victim organization. Representative models may include model architectures, or pre-trained models which define the architecture as well as model parameters from training on a dataset. The adversary may search public sources for common model architecture configuration file formats such as YAML or Python configuration files, and common model storage file formats such as ONNX (.onnx), HDF5 (.h5), Pickle (.pkl), PyTorch (.pth), or TensorFlow (.pb, .tflite).

Acquired models are useful in advancing the adversary's operations and are frequently used to tailor attacks to the victim model.

ATLAS ID
AML.T0002.001
Priority score
66
Maturity: demonstrated

Mitigations

Defenses that may help against this attack.

AML.M0001 - Limit Model Artifact Release

Business and Data UnderstandingDeployment
LifecycleBusiness and Data Understanding + 1 moreCategoryPolicy

Limiting the release of model architectures and checkpoints can reduce an adversary's ability to target those models.

AML.M0014 - Verify AI Artifacts

Business and Data UnderstandingData Preparation+1 more
LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - Cyber

Introduce proper checking of signatures to ensure that unsafe AI models will not be introduced to the system.

Case studies

Examples from public reports and exercises.

PoisonGPT

exercise
Date2023-07-01

Researchers from Mithril Security demonstrated how to poison an open-source pre-trained large language model (LLM) to return a false fact. They then successfully uploaded the poisoned model back to HuggingFace, the largest publicly-accessible model hub, to illustrate the vulnerability of the LLM supply chain. Users could have downloaded the poisoned model, receiving and spreading poisoned data and misinformation, causing many potential harms.

Backdoor Attack on Deep Learning Models in Mobile Apps

exercise
Date2021-01-18

Deep learning models are increasingly used in mobile applications as critical components. Researchers from Microsoft Research demonstrated that many deep learning models deployed in mobile apps are vulnerable to backdoor attacks via "neural payload injection." They conducted an empirical study on real-world mobile deep learning apps collected from Google Play. They identified 54 apps that were vulnerable to attack, including popular security and safety critical applications used for cash recognition, parental control, face authentication, and financial services.

Attack on Machine Translation Services

exercise
Date2020-04-30

Machine translation services (such as Google Translate, Bing Translator, and Systran Translate) provide public-facing UIs and APIs. A research group at UC Berkeley utilized these public endpoints to create a replicated model with near-production state-of-the-art translation quality. Beyond demonstrating that IP can be functionally stolen from a black-box system, they used the replicated model to successfully transfer adversarial examples to the real production services. These adversarial inputs successfully cause targeted word flips, vulgar outputs, and dropped sentences on Google Translate and Systran Translate websites.

GPT-2 Model Replication

exercise
Date2019-08-22

OpenAI built GPT-2, a language model capable of generating high quality text samples. Over concerns that GPT-2 could be used for malicious purposes such as impersonating others, or generating misleading news articles, fake social media content, or spam, OpenAI adopted a tiered release schedule. They initially released a smaller, less powerful version of GPT-2 along with a technical description of the approach, but held back the full trained model.

Before the full model was released by OpenAI, researchers at Brown University successfully replicated the model using information released by OpenAI and open source ML artifacts. This demonstrates that a bad actor with sufficient technical skill and compute resources could have replicated GPT-2 and used it for harmful goals before the AI Security community is prepared.

Source

Where this page information comes from.