Record summary
A quick snapshot of what this page covers.
Attack context
How this AI attack works in practice.
- ATLAS ID
- AML.T0005.000
- Priority score
- 36
Mitigations
Defenses that may help against this attack.
AML.M0001 - Limit Model Artifact Release
Limiting the release of model artifacts can reduce an adversary's ability to create an accurate proxy model.
AML.M0000 - Limit Public Release of Information
Limiting release of technical information about a model and training data can reduce an adversary's ability to create an accurate proxy model.
Case studies
Examples from public reports and exercises.
GPT-2 Model Replication
OpenAI built GPT-2, a language model capable of generating high quality text samples. Over concerns that GPT-2 could be used for malicious purposes such as impersonating others, or generating misleading news articles, fake social media content, or spam, OpenAI adopted a tiered release schedule. They initially released a smaller, less powerful version of GPT-2 along with a technical description of the approach, but held back the full trained model.
Before the full model was released by OpenAI, researchers at Brown University successfully replicated the model using information released by OpenAI and open source ML artifacts. This demonstrates that a bad actor with sufficient technical skill and compute resources could have replicated GPT-2 and used it for harmful goals before the AI Security community is prepared.
Source
Where this page information comes from.
Original source
Original source links
Open the public records and source datasets used for this page.