Record summary
A quick snapshot of what this page covers.
Attack context
How this AI attack works in practice.
Adversaries may extract a functional copy of a private model. By repeatedly querying the victim's AI Model Inference API Access, the adversary can collect the target model's inferences into a dataset. The inferences are used as labels for training a separate model offline that will mimic the behavior and performance of the target model.
Adversaries may extract the model to avoid paying per query in an artificial-intelligence-as-a-service (AIaaS) setting. Model extraction is used for AI Intellectual Property Theft.
- ATLAS ID
- AML.T0024.002
- Priority score
- 49
Mitigations
Defenses that may help against this attack.
AML.M0024 - AI Telemetry Logging
Telemetry logging can help identify if sensitive data has been exfiltrated.
AML.M0002 - Passive AI Output Obfuscation
Suggested approaches:
- Restrict the number of results shown
- Limit specificity of output class ontology
- Use randomized smoothing techniques
- Reduce the precision of numerical outputs
AML.M0004 - Restrict Number of AI Model Queries
Limit the volume of API queries in a given period of time to regulate the amount and fidelity of potentially sensitive information an attacker can learn.
Case studies
Examples from public reports and exercises.
Model Distillation Campaigns Targeting Anthropic Claude
Anthropic uncovered campaigns to extract Claude’s capabilities carried out by the three Chinese AI Labs: DeepSeek, Moonshot, and MiniMax. Collectively, these campaigns used approximately 24,000 accounts and 16 million queries. They used model distillation to train their own models on the outputs of Claude in an attempt to replicate Claude’s capabilities such as agentic reasoning, code generation, tool use, and computer use.
As outlined in Anthropic's report, model distillation was leveraged as a means for these labs to undermine Anthropic's export controls.[<sup>\[1\]</sup>][1] Distilled models lack the safeguards that prevent bad actors from using frontier models for malicious purposes such as the bioweapon development, disinformation, offensive cyber operations, and mass surveillance.
References
Source
Where this page information comes from.
Original source
Original source links
Open the public records and source datasets used for this page.