APromptRiskDBThreat intelligence atlas
AI Security Technique

Societal Harm - AI Security Technique

Societal harms might generate harmful outcomes that reach either the general public or specific vulnerable groups such as the exposure of children to vulgar content.

AI Security Techniquerealized

Record summary

A quick snapshot of what this page covers.

Tactics0Attacker goals connected to this method.
Mitigations0Defenses that may help against this attack.
AI risks2Research-backed risks connected to this topic.

Attack context

How this AI attack works in practice.

ATLAS ID
AML.T0048.002
Priority score
50
Maturity: realized

Mitigations

Defenses that may help against this attack.

No connected defenses. No defense is connected to this attack in the current data.

Case studies

Examples from public reports and exercises.

Model Distillation Campaigns Targeting Anthropic Claude

incident
Date2026-02-23

Anthropic uncovered campaigns to extract Claude’s capabilities carried out by the three Chinese AI Labs: DeepSeek, Moonshot, and MiniMax. Collectively, these campaigns used approximately 24,000 accounts and 16 million queries. They used model distillation to train their own models on the outputs of Claude in an attempt to replicate Claude’s capabilities such as agentic reasoning, code generation, tool use, and computer use.

As outlined in Anthropic's report, model distillation was leveraged as a means for these labs to undermine Anthropic's export controls.[<sup>\[1\]</sup>][1] Distilled models lack the safeguards that prevent bad actors from using frontier models for malicious purposes such as the bioweapon development, disinformation, offensive cyber operations, and mass surveillance.

References

  1. [1] https://www.anthropic.com/news/detecting-and-preventing-distillation-attacks

Source

Where this page information comes from.