Memory - AI Security Technique

Overview

A source-backed snapshot of this AI security technique.

Adversaries may manipulate the memory of a large language model (LLM) in order to persist changes to the LLM to future chat sessions.

Memory is a common feature in LLMs that allows them to remember information across chat sessions by utilizing a user-specific database. Because the memory is controlled via normal conversations with the user (e.g. "remember my preference for ...") an adversary can inject memories via Direct or Indirect Prompt Injection. Memories may contain malicious instructions (e.g. instructions that leak private conversations) or may promote the adversary's hidden agenda (e.g. manipulating the user).

Tactics0Attacker goals connected to this method.

Mitigations1Defenses that may help against this attack.

AI risks13Research-backed risks connected to this topic.

Technique details

Identifiers, maturity, and source taxonomy for this technique.

ATLAS ID: AML.T0080.000
Maturity: demonstrated
Priority score: 108

Attack flow

How to read the public records connected to this technique.

1. TechniqueRead the ATLAS description and evidence level.

2. TacticsSee which attacker goals this method supports.

3. ExamplesCheck whether public case studies mention it.

4. DefensesReview safeguards mapped by ATLAS.

5. SourcesOpen the original public records and references.

Impact

Why this technique may deserve attention in the current dataset.

Evidence leveldemonstrated
Mapped defenses1 ATLAS mitigation records
Public examples2 linked case study records
Research risks13 related MIT AI Risk records above the confidence threshold
Vulnerabilities0 linked CVE records

Mitigations

Defenses that may help against this attack.

1 recordView all mitigations →

AML.M0031 - Memory Hardening

Memory hardening can help protect LLM memory from manipulation and prevent poisoned memories from executing.

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

ML Model EngineeringDeployment+1 more

Case studies

Examples from public reports and exercises.

2 recordsView all case studies →

AIKatz: Attacking LLM Desktop Applications

Researchers at Lumia have demonstrated that it is possible to extract authentication tokens from the memory of LLM Desktop Applications. An attacker could then use those tokens to impersonate as the victim to the LLM backed, thereby gaining access to the victim’s conversations as well as the ability to interfere in future conversations. The attacker’s access would allow them the ability to directly inject prompts to change the LLM’s behavior, poison the LLM’s context to have persistent effects, manipulate the user’s conversation history to cover their tracks, and ultimately impact the confidentiality, integrity, and availability of the system. The researchers demonstrated this on Anthropic Claude, Microsoft M365 Copilot, and OpenAI ChatGPT.

Vendor Responses to Responsible Disclosure:

Anthropic (HackerOne) - Closed as informational since local attack.
Microsoft Security Response Center - Attack doesn’t bypass security boundaries for CVE.
OpenAI (BugCrowd) - Closed as informational and noted that it’s up to Microsoft to patch this behavior.

Date2025-01-01

exercise

Hacking ChatGPT’s Memories with Prompt Injection

Embrace the Red demonstrated that ChatGPT’s memory feature is vulnerable to manipulation via prompt injections. To execute the attack, the researcher hid a prompt injection in a shared Google Doc. When a user references the document, its contents is placed into ChatGPT’s context via the Connected App feature, and the prompt is executed, poisoning the memory with false facts. The researcher demonstrated that these injected memories persist across chat sessions. Additionally, since the prompt injection payload is introduced through shared resources, this leaves others vulnerable to the same attack and maintains persistence on the system.

Date2024-02-01

exercise

Related risks

Research-backed risks connected to this topic.

Top 10 of 13View all risks →

Jailbreaks and Prompt Injections Threaten Security of LLMs

"LLMs are not adversarially robust and are vulnerable to security failures such as jailbreaks and prompt-injection attacks. While a number of jailbreak attacks have been proposed in the literature, the lack of standar...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.78

Prompt leaking

"Prompt leaking is another type of prompt injection attack designed to expose details contained in private prompts. According to [58], prompt leaking is the act of misleading the model to print the pre-designed instru...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.78

Prompt injection attack

"A prompt injection attack forces a generative model that takes a prompt as input to produce unexpected output by manipulating the structure, instructions, or information contained in its prompt."

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.78

Prompt injection

"Prompt Injections are a form of Adversarial Input that involve manipulating the text instructions given to a GenAI system (Liu et al., 2023). Prompt Injections exploit loopholes in a model’s architec- tures that have...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.78

Showing 4 of 10