User Harm - AI Security Technique

AI Security Technique

User harms may encompass a variety of harm types including financial and reputational that are directed at or felt by individual victims of the attack rather than at the organization level.

Overview

A source-backed snapshot of this AI security technique.

Tactics0Attacker goals connected to this method.

Mitigations0Defenses that may help against this attack.

AI risks0Research-backed risks connected to this topic.

Technique details

Identifiers, maturity, and source taxonomy for this technique.

ATLAS ID: AML.T0048.003
Maturity: realized
Priority score: 150

Attack flow

How to read the public records connected to this technique.

1. TechniqueRead the ATLAS description and evidence level.

2. TacticsSee which attacker goals this method supports.

3. ExamplesCheck whether public case studies mention it.

4. DefensesReview safeguards mapped by ATLAS.

5. SourcesOpen the original public records and references.

Impact

Why this technique may deserve attention in the current dataset.

Evidence levelrealized
Mapped defenses0 ATLAS mitigation records
Public examples12 linked case study records
Research risks0 related MIT AI Risk records above the confidence threshold
Vulnerabilities0 linked CVE records

Mitigations

Defenses that may help against this attack.

View all mitigations →

No connected defenses. No defense is connected to this attack in the current data.

Case studies

Examples from public reports and exercises.

Top 10 of 12View all case studies →

Model Distillation Campaigns Targeting Anthropic Claude

Anthropic uncovered campaigns to extract Claude’s capabilities carried out by the three Chinese AI Labs: DeepSeek, Moonshot, and MiniMax. Collectively, these campaigns used approximately 24,000 accounts and 16 million queries. They used model distillation to train their own models on the outputs of Claude in an attempt to replicate Claude’s capabilities such as agentic reasoning, code generation, tool use, and computer use.

As outlined in Anthropic's report, model distillation was leveraged as a means for these labs to undermine Anthropic's export controls.^[1] Distilled models lack the safeguards that prevent bad actors from using frontier models for malicious purposes such as the bioweapon development, disinformation, offensive cyber operations, and mass surveillance.

References

[1] https://www.anthropic.com/news/detecting-and-preventing-distillation-attacks

Date2026-02-23

incident

Exposed ClawdBot Control Interfaces Leads to Credential Access and Execution

A security researcher identified hundreds of exposed ClawdBot control interfaces on the public internet. ClawdBot (now OpenClaw) “is a personal AI assistant you run on your own devices. It answers you on the channels you already use … , plus extension channels. … It can speak and listen on macOS/iOS/Android, and can render a live Canvas you control.”^[1] The researcher was able to access credentials to a variety of connected applications via ClawdBot’s configuration file. They were also able to invoke ClawdBot’s skills by prompting it via the chat interface, leading to root access in the container.

The researcher searched Shodan^[2] to identify Clawdbot instances exposed on the public internet, some without authentication enabled. The researcher demonstrated that the ClawdBot’s authentication mechanism could be bypassed due to a proxy misconfiguration.

With access to ClawdBot’s control interface, they were then able to access ClawdBot’s configuration, which contained credentials to a variety of other services. Across various exposed instances of ClawdBot, they identified Anthropic API Keys, Telegram Bot Tokens, Slack Oauth Credentials, and Signal Device Linking URIs. The researcher prompted ClawdBot directly via the chat interface, which led to exposure of its system prompt. They were also able to get ClawdBot to execute commands via it’s bash skill, which at least in once instance led to root access in the ClawdBot container.

The researcher noted a broad range of other impacts they could have had with this level of access, including:

Manipulation of user chat history with the ClawdBot AI agent
Exfiltration of conversation histories of any connected messaging services
Impersonation of users by sending messages on their behalf via connected messaging services

References

Date2026-01-25

exercise

Data Exfiltration via Remote Poisoned MCP Tool

Researchers at Invariant Labs demonstrated that AI agents configured with remote Model Context Protocol (MCP) Tools can be vulnerable to model poisoning attacks. They show that an MCP Tool can contain malicious prompts in its docstring description, which is ingested into the AI agent’s context, modifying its behavior.

They demonstrate this attack with a proof-of-concept MCP Tool that instructs the agent to perform additional actions before using the tool. The agent is instructed to read files containing credentials from the victim’s machine and store their contents in one of the input variables to the tool. When the tool runs, the victim’s credentials are exfiltrated to the poisoned MCP server.

Date2025-04-01

exercise

Rules File Backdoor: Supply Chain Attack on AI Coding Assistants

Pillar Security researchers demonstrated how adversaries can compromise AI-generated code by injecting malicious instructions into rules files used to configure AI coding assistants like Cursor and GitHub Copilot. The attack uses invisible Unicode characters to hide malicious prompts that manipulate the AI to insert backdoors, vulnerabilities, or malicious scripts into generated code. These poisoned rules files are distributed through open-source repositories and developer communities, creating a scalable supply chain attack that could affect millions of developers and end users through compromised software.

Vendor Response to Responsible Disclosure:

Cursor: Determined that this risk falls under the users’ responsibility.
GitHub Copilot: Implemented a new security feature that displays a warning when a file's contents include hidden Unicode text on github.com.

Date2025-03-18

exercise

Showing 4 of 10

Source evidence

Original public records and references for this page.

View all sources →

Original source

Original source links

Open the public records and source datasets used for this page.

Repositoryhttps://github.com/mitre-atlas/atlas-data ATLAS.yamlhttps://github.com/mitre-atlas/atlas-data/blob/main/dist/ATLAS.yaml Schemahttps://github.com/mitre-atlas/atlas-data/blob/main/dist/schemas/atlas_output_schema.json

User Harm - AI Security Technique

Overview

Technique details

Attack flow

Impact

Mitigations

Case studies

Model Distillation Campaigns Targeting Anthropic Claude

References

Exposed ClawdBot Control Interfaces Leads to Credential Access and Execution

References

Data Exfiltration via Remote Poisoned MCP Tool

Rules File Backdoor: Supply Chain Attack on AI Coding Assistants

Related risks

Vulnerabilities

Source evidence

Original source links