APromptRiskDBThreat intelligence atlas
AI Security Technique

Train Proxy via Gathered AI Artifacts - AI Security Technique

Proxy models may be trained from AI artifacts (such as data, model architectures, and pre-trained models) that are representative of the target model gathered by the adversary. This can be used to develop attacks that require higher levels of access than the adversary has available or as a means to validate pre-existing attacks without interacting with the target model.

AI Security Techniquedemonstrated

Record summary

A quick snapshot of what this page covers.

Tactics0Attacker goals connected to this method.
Mitigations2Defenses that may help against this attack.
AI risks0Research-backed risks connected to this topic.

Attack context

How this AI attack works in practice.

ATLAS ID
AML.T0005.000
Priority score
36
Maturity: demonstrated

Mitigations

Defenses that may help against this attack.

AML.M0001 - Limit Model Artifact Release

Business and Data UnderstandingDeployment
LifecycleBusiness and Data Understanding + 1 moreCategoryPolicy

Limiting the release of model artifacts can reduce an adversary's ability to create an accurate proxy model.

AML.M0000 - Limit Public Release of Information

Business and Data Understanding
LifecycleBusiness and Data UnderstandingCategoryPolicy

Limiting release of technical information about a model and training data can reduce an adversary's ability to create an accurate proxy model.

Case studies

Examples from public reports and exercises.

GPT-2 Model Replication

exercise
Date2019-08-22

OpenAI built GPT-2, a language model capable of generating high quality text samples. Over concerns that GPT-2 could be used for malicious purposes such as impersonating others, or generating misleading news articles, fake social media content, or spam, OpenAI adopted a tiered release schedule. They initially released a smaller, less powerful version of GPT-2 along with a technical description of the approach, but held back the full trained model.

Before the full model was released by OpenAI, researchers at Brown University successfully replicated the model using information released by OpenAI and open source ML artifacts. This demonstrates that a bad actor with sufficient technical skill and compute resources could have replicated GPT-2 and used it for harmful goals before the AI Security community is prepared.

Source

Where this page information comes from.