Model Distillation Campaigns Targeting Anthropic Claude - AI Case Study

AI Case Study

Anthropic uncovered campaigns to extract Claude’s capabilities carried out by the three Chinese AI Labs: DeepSeek, Moonshot, and MiniMax. Collectively, these campaigns used approximately 24,000 accounts and 16 million queries. They used model distillation to train their own models on the outputs of Claude in an attempt to replicate Claude’s capabilities such as agentic reasoning, code generation, tool use, and com...

Overview

Case steps7Steps described in the case record.

Techniques7Attack methods mentioned in the case steps.

Linked CVEs0Known vulnerabilities mentioned in the record.

Risk patterns

Patterns found in the case record and its linked vulnerabilities.

1Dominant ATLAS tactic. Impact appears in 3 case steps.
2Multiple attack methods. The case connects to 7 unique AI attack methods.

Procedure timeline

Search the case steps or filter them by attacker goal.

Impact3Resource Development2AI Model Access1Exfiltration1

Step 1
AI Service Proxies
Resource Development

DeepSeek, Moonshot AI, and MiniMax used commercial proxy services to gain access to Claude. This circumvented Anthropic’s policy of not offering commercial access to Claude in China.
Step 2
LLM Prompt Crafting
Resource Development

DeepSeek, Moonshot AI, and MiniMax generated large datasets of prompts designed to extract capabilities from Claude.
Step 3
AI Model Inference API Access
AI Model Access

The AI labs accessed Claude’s inference API via the combined approximately 24,000 fraudulent accounts.
Step 4
Extract AI Model
Exfiltration

DeepSeek, Moonshot AI, and MiniMax used their generated prompts to repeatedly query Claude and train their own models from the responses. Collectively, the labs issued over 16 million queries during their distillation campaigns.
Step 5
AI Intellectual Property Theft
Impact

DeepSeek, Moonshot AI, and MiniMax acquired Claude’s capabilities via distillation at a fraction of the cost of developing their own models. They targeted Claude’s most differentiated capabilities including agentic reasoning, tool use, and code generation.
Step 6
Societal Harm
Impact

The distilled models lack safeguards and could be used for malicious purposes such as offensive cyber operations, disinformation campaigns, mass surveillance, and censorship.
Step 7
User Harm
Impact

The distilled models lack Claude's safety guardrails, potentially exposing users to harmful outputs and behaviors.

Mitigations

Defenses connected to the attack methods in this case.

7 recordsView all mitigations →

AI Model Distribution Methods

Deploying AI models to edge devices can increase the attack surface of the system. Consider serving models in the cloud to reduce the level of access the adversary has to the model. Also consider computing features in the cloud to prevent gray-box attacks, where an adversary has access to the model preprocessing methods.

AI Telemetry Logging

Implement logging of inputs and outputs of deployed AI models. When deploying AI agents, implement logging of the intermediate steps of agentic actions and decisions, data access and tool use, installation commands, and identity of the agent. Monitoring logs can help to detect security threats and mitigate impacts.

Additionally, having logging enabled can discourage adversaries who want to remain undetected from utilizing AI resources.

Control Access to AI Models and Data at Rest

Establish access controls on internal model registries and limit internal access to production models. Limit access to training data only to approved users.

Control Access to AI Models and Data in Production

Require users to verify their identities before accessing a production model. Require authentication for API endpoints and monitor production model queries to ensure compliance with usage policies and to prevent model misuse.

Showing 4 of 7

Source evidence

Original public records and references for this case.

View all sources →

Original source

Original source links

Open the MITRE ATLAS data and public references used for this case study.

Repositoryhttps://github.com/mitre-atlas/atlas-data ATLAS.yamlhttps://github.com/mitre-atlas/atlas-data/blob/main/dist/ATLAS.yaml Schemahttps://github.com/mitre-atlas/atlas-data/blob/main/dist/schemas/atlas_output_schema.json Detecting and preventing distillation attackshttps://www.anthropic.com/news/detecting-and-preventing-distillation-attacks