Record summary
A quick snapshot of what this page covers.
Attack context
How this AI attack works in practice.
- ATLAS ID
- AML.T0048.003
- Priority score
- 150
Mitigations
Defenses that may help against this attack.
Case studies
Examples from public reports and exercises.
Model Distillation Campaigns Targeting Anthropic Claude
Anthropic uncovered campaigns to extract Claude’s capabilities carried out by the three Chinese AI Labs: DeepSeek, Moonshot, and MiniMax. Collectively, these campaigns used approximately 24,000 accounts and 16 million queries. They used model distillation to train their own models on the outputs of Claude in an attempt to replicate Claude’s capabilities such as agentic reasoning, code generation, tool use, and computer use.
As outlined in Anthropic's report, model distillation was leveraged as a means for these labs to undermine Anthropic's export controls.[<sup>\[1\]</sup>][1] Distilled models lack the safeguards that prevent bad actors from using frontier models for malicious purposes such as the bioweapon development, disinformation, offensive cyber operations, and mass surveillance.
References
Exposed ClawdBot Control Interfaces Leads to Credential Access and Execution
A security researcher identified hundreds of exposed ClawdBot control interfaces on the public internet. ClawdBot (now OpenClaw) “is a personal AI assistant you run on your own devices. It answers you on the channels you already use … , plus extension channels. … It can speak and listen on macOS/iOS/Android, and can render a live Canvas you control.”[<sup>\[1\]</sup>][1] The researcher was able to access credentials to a variety of connected applications via ClawdBot’s configuration file. They were also able to invoke ClawdBot’s skills by prompting it via the chat interface, leading to root access in the container.
The researcher searched Shodan[<sup>\[2\]</sup>][2] to identify Clawdbot instances exposed on the public internet, some without authentication enabled. The researcher demonstrated that the ClawdBot’s authentication mechanism could be bypassed due to a proxy misconfiguration.
With access to ClawdBot’s control interface, they were then able to access ClawdBot’s configuration, which contained credentials to a variety of other services. Across various exposed instances of ClawdBot, they identified Anthropic API Keys, Telegram Bot Tokens, Slack Oauth Credentials, and Signal Device Linking URIs. The researcher prompted ClawdBot directly via the chat interface, which led to exposure of its system prompt. They were also able to get ClawdBot to execute commands via it’s bash skill, which at least in once instance led to root access in the ClawdBot container.
The researcher noted a broad range of other impacts they could have had with this level of access, including:
- Manipulation of user chat history with the ClawdBot AI agent
- Exfiltration of conversation histories of any connected messaging services
- Impersonation of users by sending messages on their behalf via connected messaging services
References
Data Exfiltration via Remote Poisoned MCP Tool
Researchers at Invariant Labs demonstrated that AI agents configured with remote Model Context Protocol (MCP) Tools can be vulnerable to model poisoning attacks. They show that an MCP Tool can contain malicious prompts in its docstring description, which is ingested into the AI agent’s context, modifying its behavior.
They demonstrate this attack with a proof-of-concept MCP Tool that instructs the agent to perform additional actions before using the tool. The agent is instructed to read files containing credentials from the victim’s machine and store their contents in one of the input variables to the tool. When the tool runs, the victim’s credentials are exfiltrated to the poisoned MCP server.
Rules File Backdoor: Supply Chain Attack on AI Coding Assistants
Pillar Security researchers demonstrated how adversaries can compromise AI-generated code by injecting malicious instructions into rules files used to configure AI coding assistants like Cursor and GitHub Copilot. The attack uses invisible Unicode characters to hide malicious prompts that manipulate the AI to insert backdoors, vulnerabilities, or malicious scripts into generated code. These poisoned rules files are distributed through open-source repositories and developer communities, creating a scalable supply chain attack that could affect millions of developers and end users through compromised software.
Vendor Response to Responsible Disclosure:
- Cursor: Determined that this risk falls under the users’ responsibility.
- GitHub Copilot: Implemented a new security feature that displays a warning when a file's contents include hidden Unicode text on github.com.
AIKatz: Attacking LLM Desktop Applications
Researchers at Lumia have demonstrated that it is possible to extract authentication tokens from the memory of LLM Desktop Applications. An attacker could then use those tokens to impersonate as the victim to the LLM backed, thereby gaining access to the victim’s conversations as well as the ability to interfere in future conversations. The attacker’s access would allow them the ability to directly inject prompts to change the LLM’s behavior, poison the LLM’s context to have persistent effects, manipulate the user’s conversation history to cover their tracks, and ultimately impact the confidentiality, integrity, and availability of the system. The researchers demonstrated this on Anthropic Claude, Microsoft M365 Copilot, and OpenAI ChatGPT.
Vendor Responses to Responsible Disclosure:
- Anthropic (HackerOne) - Closed as informational since local attack.
- Microsoft Security Response Center - Attack doesn’t bypass security boundaries for CVE.
- OpenAI (BugCrowd) - Closed as informational and noted that it’s up to Microsoft to patch this behavior.
ChatGPT Package Hallucination
Researchers identified that large language models such as ChatGPT can hallucinate fake software package names that are not published to a package repository. An attacker could publish a malicious package under the hallucinated name to a package repository. Then users of the same or similar large language models may encounter the same hallucination and ultimately download and execute the malicious package leading to a variety of potential harms.
Morris II Worm: RAG-Based Attack
Researchers developed Morris II, a zero-click worm designed to attack generative AI (GenAI) ecosystems and propagate between connected GenAI systems. The worm uses an adversarial self-replicating prompt which uses prompt injection to replicate the prompt as output and perform malicious activity. The researchers demonstrate how this worm can propagate through an email system with a RAG-based assistant. They use a target system that automatically ingests received emails, retrieves past correspondences, and generates a reply for the user. To carry out the attack, they send a malicious email containing the adversarial self-replicating prompt, which ends up in the RAG database. The malicious instructions in the prompt tell the assistant to include sensitive user data in the response. Future requests to the email assistant may retrieve the malicious email. This leads to propagation of the worm due to the self-replicating portion of the prompt, as well as leaking private information due to the malicious instructions.
Hacking ChatGPT’s Memories with Prompt Injection
Embrace the Red demonstrated that ChatGPT’s memory feature is vulnerable to manipulation via prompt injections. To execute the attack, the researcher hid a prompt injection in a shared Google Doc. When a user references the document, its contents is placed into ChatGPT’s context via the Connected App feature, and the prompt is executed, poisoning the memory with false facts. The researcher demonstrated that these injected memories persist across chat sessions. Additionally, since the prompt injection payload is introduced through shared resources, this leaves others vulnerable to the same attack and maintains persistence on the system.
Google Bard Conversation Exfiltration
Embrace the Red demonstrated that Bard users' conversations could be exfiltrated via an indirect prompt injection. To execute the attack, a threat actor shares a Google Doc containing the prompt with the target user who then interacts with the document via Bard to inadvertently execute the prompt. The prompt causes Bard to respond with the markdown for an image, whose URL has the user's conversation secretly embedded. Bard renders the image for the user, creating an automatic request to an adversary-controlled script and exfiltrating the user's conversation. The request is not blocked by Google's Content Security Policy (CSP), because the script is hosted as a Google Apps Script with a Google-owned domain.
Note: Google has fixed this vulnerability. The CSP remains the same, and Bard can still render images for the user, so there may be some filtering of data embedded in URLs.
ChatGPT Conversation Exfiltration
Embrace the Red demonstrated that ChatGPT users' conversations can be exfiltrated via an indirect prompt injection. To execute the attack, a threat actor uploads a malicious prompt to a public website, where a ChatGPT user may interact with it. The prompt causes ChatGPT to respond with the markdown for an image, whose URL has the user's conversation secretly embedded. ChatGPT renders the image for the user, creating a automatic request to an adversary-controlled script and exfiltrating the user's conversation. Additionally, the researcher demonstrated how the prompt can execute other plugins, opening them up to additional harms.
Indirect Prompt Injection Threats: Bing Chat Data Pirate
Whenever interacting with Microsoft's new Bing Chat LLM Chatbot, a user can allow Bing Chat permission to view and access currently open websites throughout the chat session. Researchers demonstrated the ability for an attacker to plant an injection in a website the user is visiting, which silently turns Bing Chat into a Social Engineer who seeks out and exfiltrates personal information. The user doesn't have to ask about the website or do anything except interact with Bing Chat while the website is opened in the browser in order for this attack to be executed.
In the provided demonstration, a user opened a prepared malicious website containing an indirect prompt injection attack (could also be on a social media site) in Edge. The website includes a prompt which is read by Bing and changes its behavior to access user information, which in turn can sent to an attacker.
Attempted Evasion of ML Phishing Webpage Detection System
Adversaries create phishing websites that appear visually similar to legitimate sites. These sites are designed to trick users into entering their credentials, which are then sent to the bad actor. To combat this behavior, security companies utilize AI/ML-based approaches to detect phishing sites and block them in their endpoint security products.
In this incident, adversarial examples were identified in the logs of a commercial machine learning phishing website detection system. The detection system makes an automated block/allow determination from the "phishing score" of an ensemble of image classifiers each responsible for different phishing indicators (visual similarity, input form detection, etc.). The adversarial examples appeared to employ several simple yet effective strategies for manually modifying brand logos in an attempt to evade image classification models. The phishing websites which employed logo modification methods successfully evaded the model responsible detecting brand impersonation via visual similarity. However, the other components of the system successfully flagged the phishing websites.
Source
Where this page information comes from.
Original source
Original source links
Open the public records and source datasets used for this page.