APromptRiskDBThreat intelligence atlas
AI Security Technique

Memory - AI Security Technique

Adversaries may manipulate the memory of a large language model (LLM) in order to persist changes to the LLM to future chat sessions. Memory is a common feature in LLMs that allows them to remember information across chat sessions by utilizing a user-specific database. Because the memory is controlled via normal conversations with the user (e.g. "remember my preference for ...") an adversary can inject memories vi...

AI Security Techniquedemonstrated

Record summary

A quick snapshot of what this page covers.

Tactics0Attacker goals connected to this method.
Mitigations1Defenses that may help against this attack.
AI risks13Research-backed risks connected to this topic.

Attack context

How this AI attack works in practice.

Adversaries may manipulate the memory of a large language model (LLM) in order to persist changes to the LLM to future chat sessions.

Memory is a common feature in LLMs that allows them to remember information across chat sessions by utilizing a user-specific database. Because the memory is controlled via normal conversations with the user (e.g. "remember my preference for ...") an adversary can inject memories via Direct or Indirect Prompt Injection. Memories may contain malicious instructions (e.g. instructions that leak private conversations) or may promote the adversary's hidden agenda (e.g. manipulating the user).

ATLAS ID
AML.T0080.000
Priority score
108
Maturity: demonstrated

Mitigations

Defenses that may help against this attack.

AML.M0031 - Memory Hardening

ML Model EngineeringDeployment+1 more
LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Memory hardening can help protect LLM memory from manipulation and prevent poisoned memories from executing.

Case studies

Examples from public reports and exercises.

AIKatz: Attacking LLM Desktop Applications

exercise
Date2025-01-01

Researchers at Lumia have demonstrated that it is possible to extract authentication tokens from the memory of LLM Desktop Applications. An attacker could then use those tokens to impersonate as the victim to the LLM backed, thereby gaining access to the victim’s conversations as well as the ability to interfere in future conversations. The attacker’s access would allow them the ability to directly inject prompts to change the LLM’s behavior, poison the LLM’s context to have persistent effects, manipulate the user’s conversation history to cover their tracks, and ultimately impact the confidentiality, integrity, and availability of the system. The researchers demonstrated this on Anthropic Claude, Microsoft M365 Copilot, and OpenAI ChatGPT.

Vendor Responses to Responsible Disclosure:

  • Anthropic (HackerOne) - Closed as informational since local attack.
  • Microsoft Security Response Center - Attack doesn’t bypass security boundaries for CVE.
  • OpenAI (BugCrowd) - Closed as informational and noted that it’s up to Microsoft to patch this behavior.

Hacking ChatGPT’s Memories with Prompt Injection

exercise
Date2024-02-01

Embrace the Red demonstrated that ChatGPT’s memory feature is vulnerable to manipulation via prompt injections. To execute the attack, the researcher hid a prompt injection in a shared Google Doc. When a user references the document, its contents is placed into ChatGPT’s context via the Connected App feature, and the prompt is executed, poisoning the memory with false facts. The researcher demonstrated that these injected memories persist across chat sessions. Additionally, since the prompt injection payload is introduced through shared resources, this leaves others vulnerable to the same attack and maintains persistence on the system.

Source

Where this page information comes from.