Record summary
A quick snapshot of what this page covers.
Attack context
How this AI attack works in practice.
Adversaries may manipulate the memory of a large language model (LLM) in order to persist changes to the LLM to future chat sessions.
Memory is a common feature in LLMs that allows them to remember information across chat sessions by utilizing a user-specific database. Because the memory is controlled via normal conversations with the user (e.g. "remember my preference for ...") an adversary can inject memories via Direct or Indirect Prompt Injection. Memories may contain malicious instructions (e.g. instructions that leak private conversations) or may promote the adversary's hidden agenda (e.g. manipulating the user).
- ATLAS ID
- AML.T0080.000
- Priority score
- 108
Mitigations
Defenses that may help against this attack.
AML.M0031 - Memory Hardening
Memory hardening can help protect LLM memory from manipulation and prevent poisoned memories from executing.
Case studies
Examples from public reports and exercises.
AIKatz: Attacking LLM Desktop Applications
Researchers at Lumia have demonstrated that it is possible to extract authentication tokens from the memory of LLM Desktop Applications. An attacker could then use those tokens to impersonate as the victim to the LLM backed, thereby gaining access to the victim’s conversations as well as the ability to interfere in future conversations. The attacker’s access would allow them the ability to directly inject prompts to change the LLM’s behavior, poison the LLM’s context to have persistent effects, manipulate the user’s conversation history to cover their tracks, and ultimately impact the confidentiality, integrity, and availability of the system. The researchers demonstrated this on Anthropic Claude, Microsoft M365 Copilot, and OpenAI ChatGPT.
Vendor Responses to Responsible Disclosure:
- Anthropic (HackerOne) - Closed as informational since local attack.
- Microsoft Security Response Center - Attack doesn’t bypass security boundaries for CVE.
- OpenAI (BugCrowd) - Closed as informational and noted that it’s up to Microsoft to patch this behavior.
Hacking ChatGPT’s Memories with Prompt Injection
Embrace the Red demonstrated that ChatGPT’s memory feature is vulnerable to manipulation via prompt injections. To execute the attack, the researcher hid a prompt injection in a shared Google Doc. When a user references the document, its contents is placed into ChatGPT’s context via the Connected App feature, and the prompt is executed, poisoning the memory with false facts. The researcher demonstrated that these injected memories persist across chat sessions. Additionally, since the prompt injection payload is introduced through shared resources, this leaves others vulnerable to the same attack and maintains persistence on the system.
Source
Where this page information comes from.
Original source
Original source links
Open the public records and source datasets used for this page.