APromptRiskDBThreat intelligence atlas
AI Security Technique

Manipulate User LLM Chat History - AI Security Technique

Adversaries may manipulate a user's large language model (LLM) chat history to cover the tracks of their malicious behavior. They may hide persistent changes they have made to the LLM's behavior, or obscure their attempts at discovering private information about the user. To do so, adversaries may delete or edit existing messages or create new threads as part of their coverup. This is feasible if the adversary has...

AI Security TechniquedemonstratedDefense Evasion

Record summary

A quick snapshot of what this page covers.

Tactics1Attacker goals connected to this method.
Mitigations0Defenses that may help against this attack.
AI risks0Research-backed risks connected to this topic.

Attack context

How this AI attack works in practice.

Adversaries may manipulate a user's large language model (LLM) chat history to cover the tracks of their malicious behavior. They may hide persistent changes they have made to the LLM's behavior, or obscure their attempts at discovering private information about the user.

To do so, adversaries may delete or edit existing messages or create new threads as part of their coverup. This is feasible if the adversary has the victim's authentication tokens for the backend LLM service or if they have direct access to the victim's chat interface.

Chat interfaces (especially desktop interfaces) often do not show the injected prompt for any ongoing chat, as they update chat history only once when initially opening it. This can help the adversary's manipulations go unnoticed by the victim.

ATLAS ID
AML.T0092
Priority score
40
Maturity: demonstrated
Defense Evasion

Mitigations

Defenses that may help against this attack.

No connected defenses. No defense is connected to this attack in the current data.

Case studies

Examples from public reports and exercises.

Exposed ClawdBot Control Interfaces Leads to Credential Access and Execution

exercise
Date2026-01-25

A security researcher identified hundreds of exposed ClawdBot control interfaces on the public internet. ClawdBot (now OpenClaw) “is a personal AI assistant you run on your own devices. It answers you on the channels you already use … , plus extension channels. … It can speak and listen on macOS/iOS/Android, and can render a live Canvas you control.”[<sup>\[1\]</sup>][1] The researcher was able to access credentials to a variety of connected applications via ClawdBot’s configuration file. They were also able to invoke ClawdBot’s skills by prompting it via the chat interface, leading to root access in the container.

The researcher searched Shodan[<sup>\[2\]</sup>][2] to identify Clawdbot instances exposed on the public internet, some without authentication enabled. The researcher demonstrated that the ClawdBot’s authentication mechanism could be bypassed due to a proxy misconfiguration.

With access to ClawdBot’s control interface, they were then able to access ClawdBot’s configuration, which contained credentials to a variety of other services. Across various exposed instances of ClawdBot, they identified Anthropic API Keys, Telegram Bot Tokens, Slack Oauth Credentials, and Signal Device Linking URIs. The researcher prompted ClawdBot directly via the chat interface, which led to exposure of its system prompt. They were also able to get ClawdBot to execute commands via it’s bash skill, which at least in once instance led to root access in the ClawdBot container.

The researcher noted a broad range of other impacts they could have had with this level of access, including:

  • Manipulation of user chat history with the ClawdBot AI agent
  • Exfiltration of conversation histories of any connected messaging services
  • Impersonation of users by sending messages on their behalf via connected messaging services

References

  1. [1] https://github.com/openclaw/openclaw
  2. [2] https://www.shodan.io/search?query=Clawdbot+Control

AIKatz: Attacking LLM Desktop Applications

exercise
Date2025-01-01

Researchers at Lumia have demonstrated that it is possible to extract authentication tokens from the memory of LLM Desktop Applications. An attacker could then use those tokens to impersonate as the victim to the LLM backed, thereby gaining access to the victim’s conversations as well as the ability to interfere in future conversations. The attacker’s access would allow them the ability to directly inject prompts to change the LLM’s behavior, poison the LLM’s context to have persistent effects, manipulate the user’s conversation history to cover their tracks, and ultimately impact the confidentiality, integrity, and availability of the system. The researchers demonstrated this on Anthropic Claude, Microsoft M365 Copilot, and OpenAI ChatGPT.

Vendor Responses to Responsible Disclosure:

  • Anthropic (HackerOne) - Closed as informational since local attack.
  • Microsoft Security Response Center - Attack doesn’t bypass security boundaries for CVE.
  • OpenAI (BugCrowd) - Closed as informational and noted that it’s up to Microsoft to patch this behavior.

Source

Where this page information comes from.