Train Proxy via Gathered AI Artifacts - AI Security Technique

Record summary

A quick snapshot of what this page covers.

Tactics0Attacker goals connected to this method.

Mitigations2Defenses that may help against this attack.

AI risks0Research-backed risks connected to this topic.

Attack context

How this AI attack works in practice.

ATLAS ID: AML.T0005.000
Priority score: 36

Maturity: demonstrated

Mitigations

Defenses that may help against this attack.

AML.M0001 - Limit Model Artifact Release

Business and Data UnderstandingDeployment

LifecycleBusiness and Data Understanding + 1 moreCategoryPolicy

Limiting the release of model artifacts can reduce an adversary's ability to create an accurate proxy model.

AML.M0000 - Limit Public Release of Information

Business and Data Understanding

LifecycleBusiness and Data UnderstandingCategoryPolicy

Limiting release of technical information about a model and training data can reduce an adversary's ability to create an accurate proxy model.

Case studies

Examples from public reports and exercises.

GPT-2 Model Replication

exercise

Date2019-08-22

OpenAI built GPT-2, a language model capable of generating high quality text samples. Over concerns that GPT-2 could be used for malicious purposes such as impersonating others, or generating misleading news articles, fake social media content, or spam, OpenAI adopted a tiered release schedule. They initially released a smaller, less powerful version of GPT-2 along with a technical description of the approach, but held back the full trained model.

Before the full model was released by OpenAI, researchers at Brown University successfully replicated the model using information released by OpenAI and open source ML artifacts. This demonstrates that a bad actor with sufficient technical skill and compute resources could have replicated GPT-2 and used it for harmful goals before the AI Security community is prepared.

Related risks

Research-backed risks connected to this topic.

No related AI risks. No research risk is connected to this topic in the current data.

Related CVEs

Known software flaws linked to this context.

No related CVEs. No software flaw is connected to this attack in the current data.

Source

Where this page information comes from.

Original source

Original source links

Open the public records and source datasets used for this page.

Repositoryhttps://github.com/mitre-atlas/atlas-data ATLAS.yamlhttps://github.com/mitre-atlas/atlas-data/blob/main/dist/ATLAS.yaml Schemahttps://github.com/mitre-atlas/atlas-data/blob/main/dist/schemas/atlas_output_schema.json