Models - AI Security Technique

AI Security Technique

Overview

A source-backed snapshot of this AI security technique.

Adversaries may acquire public models to use in their operations. Adversaries may seek models used by the victim organization or models that are representative of those used by the victim organization. Representative models may include model architectures, or pre-trained models which define the architecture as well as model parameters from training on a dataset. The adversary may search public sources for common model architecture configuration file formats such as YAML or Python configuration files, and common model storage file formats such as ONNX (.onnx), HDF5 (.h5), Pickle (.pkl), PyTorch (.pth), or TensorFlow (.pb, .tflite).

Acquired models are useful in advancing the adversary's operations and are frequently used to tailor attacks to the victim model.

Tactics0Attacker goals connected to this method.

Mitigations2Defenses that may help against this attack.

AI risks0Research-backed risks connected to this topic.

Technique details

Identifiers, maturity, and source taxonomy for this technique.

ATLAS ID: AML.T0002.001
Maturity: demonstrated
Priority score: 66

Attack flow

How to read the public records connected to this technique.

1. TechniqueRead the ATLAS description and evidence level.

2. TacticsSee which attacker goals this method supports.

3. ExamplesCheck whether public case studies mention it.

4. DefensesReview safeguards mapped by ATLAS.

5. SourcesOpen the original public records and references.

Impact

Why this technique may deserve attention in the current dataset.

Evidence leveldemonstrated
Mapped defenses2 ATLAS mitigation records
Public examples4 linked case study records
Research risks0 related MIT AI Risk records above the confidence threshold
Vulnerabilities0 linked CVE records

Mitigations

Defenses that may help against this attack.

2 recordsView all mitigations →

AML.M0001 - Limit Model Artifact Release

Limiting the release of model architectures and checkpoints can reduce an adversary's ability to target those models.

LifecycleBusiness and Data Understanding + 1 moreCategoryPolicy

B&D UnderstandingDeployment

AML.M0014 - Verify AI Artifacts

Introduce proper checking of signatures to ensure that unsafe AI models will not be introduced to the system.

LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - Cyber

B&D UnderstandingData Preparation+1 more

Case studies

Examples from public reports and exercises.

4 recordsView all case studies →

PoisonGPT

Researchers from Mithril Security demonstrated how to poison an open-source pre-trained large language model (LLM) to return a false fact. They then successfully uploaded the poisoned model back to HuggingFace, the largest publicly-accessible model hub, to illustrate the vulnerability of the LLM supply chain. Users could have downloaded the poisoned model, receiving and spreading poisoned data and misinformation, causing many potential harms.

Date2023-07-01

exercise

Backdoor Attack on Deep Learning Models in Mobile Apps

Deep learning models are increasingly used in mobile applications as critical components. Researchers from Microsoft Research demonstrated that many deep learning models deployed in mobile apps are vulnerable to backdoor attacks via "neural payload injection." They conducted an empirical study on real-world mobile deep learning apps collected from Google Play. They identified 54 apps that were vulnerable to attack, including popular security and safety critical applications used for cash recognition, parental control, face authentication, and financial services.

Date2021-01-18

exercise

Attack on Machine Translation Services

Machine translation services (such as Google Translate, Bing Translator, and Systran Translate) provide public-facing UIs and APIs. A research group at UC Berkeley utilized these public endpoints to create a replicated model with near-production state-of-the-art translation quality. Beyond demonstrating that IP can be functionally stolen from a black-box system, they used the replicated model to successfully transfer adversarial examples to the real production services. These adversarial inputs successfully cause targeted word flips, vulgar outputs, and dropped sentences on Google Translate and Systran Translate websites.

Date2020-04-30

exercise

GPT-2 Model Replication

OpenAI built GPT-2, a language model capable of generating high quality text samples. Over concerns that GPT-2 could be used for malicious purposes such as impersonating others, or generating misleading news articles, fake social media content, or spam, OpenAI adopted a tiered release schedule. They initially released a smaller, less powerful version of GPT-2 along with a technical description of the approach, but held back the full trained model.

Before the full model was released by OpenAI, researchers at Brown University successfully replicated the model using information released by OpenAI and open source ML artifacts. This demonstrates that a bad actor with sufficient technical skill and compute resources could have replicated GPT-2 and used it for harmful goals before the AI Security community is prepared.

Date2019-08-22

exercise

Source evidence

Original public records and references for this page.

View all sources →

Original source

Original source links

Open the public records and source datasets used for this page.

Repositoryhttps://github.com/mitre-atlas/atlas-data ATLAS.yamlhttps://github.com/mitre-atlas/atlas-data/blob/main/dist/ATLAS.yaml Schemahttps://github.com/mitre-atlas/atlas-data/blob/main/dist/schemas/atlas_output_schema.json

Models - AI Security Technique

Overview

Technique details

Attack flow

Impact

Mitigations

AML.M0001 - Limit Model Artifact Release

AML.M0014 - Verify AI Artifacts

Case studies

PoisonGPT

Backdoor Attack on Deep Learning Models in Mobile Apps

Attack on Machine Translation Services

GPT-2 Model Replication

Related risks

Vulnerabilities

Source evidence

Original source links