Hacking ChatGPT’s Memories with Prompt Injection - AI Case Study

AI Case Study

Embrace the Red demonstrated that ChatGPT’s memory feature is vulnerable to manipulation via prompt injections. To execute the attack, the researcher hid a prompt injection in a shared Google Doc. When a user references the document, its contents is placed into ChatGPT’s context via the Connected App feature, and the prompt is executed, poisoning the memory with false facts. The...

Overview

Case steps7Steps described in the case record.

Techniques6Attack methods mentioned in the case steps.

Linked CVEs0Known vulnerabilities mentioned in the record.

Risk patterns

Patterns found in the case record and its linked vulnerabilities.

1Dominant ATLAS tactic. Persistence appears in 2 case steps.
2Multiple attack methods. The case connects to 6 unique AI attack methods.

Procedure timeline

Search the case steps or filter them by attacker goal.

Persistence2Resource Development1Defense Evasion1Initial Access1Execution1Impact1

Step 1
LLM Prompt Crafting
Resource Development

The researcher crafted a basic prompt asking to set the memory context with a bulleted list of incorrect facts.
Step 2
LLM Prompt Obfuscation
Defense Evasion

The researcher placed the prompt in a Google Doc hidden in the header with tiny font matching the document’s background color to make it invisible.
Step 3
Prompt Infiltration via Public-Facing Application
Initial Access

The Google Doc was shared with the victim, making it accessible to ChatGPT’s via its Connected App feature.
Step 4
Indirect
Execution

When a user referenced something in the shared document, its contents was added to the chat context, and the prompt was executed by ChatGPT.
Step 5
Memory
Persistence

The prompt caused new memories to be introduced, changing the behavior of ChatGPT. The chat window indicated that the memory has been set, despite the lack of human verification or intervention. All future chat sessions will use the poisoned memory store.
Step 6
Prompt Infiltration via Public-Facing Application
Persistence

The memory poisoning prompt injection persists in the shared Google Doc, where it can spread to other users and chat sessions, making it difficult to trace sources of the memories and remove.
Step 7
User Harm
Impact

The victim can be misinformed, misled, or influenced as directed by ChatGPT's poisoned memories.

Mitigations

Defenses connected to the attack methods in this case.

3 recordsView all mitigations →

AI Telemetry Logging

Implement logging of inputs and outputs of deployed AI models. When deploying AI agents, implement logging of the intermediate steps of agentic actions and decisions, data access and tool use, installation commands, and identity of the agent. Monitoring logs can help to detect security threats and mitigate impacts.

Additionally, having logging enabled can discourage adversaries who want to remain undetected from utilizing AI resources.

Input and Output Validation for AI Agent Components

Implement validation on inputs and outputs for the tools and data sources used by AI agents. Validation includes enforcing a common data format, schema validation, checks for sensitive or prohibited information leakage, and data sanitization to remove potential injections or unsafe code. Input and output validation can help prevent compromises from spreading in AI-enabled systems and can help secure the workflow when multiple components are chained together. Validation should be performed external to the AI agent.

Memory Hardening

Memory Hardening involves developing trust boundaries and secure processes for how an AI agent stores and accesses memory and context. This may be implemented using a combination of strategies including restricting an agent's ability to store memories by requiring external authentication and validation for memory updates, performing semantic integrity checks on retrieved memories before agents execute actions, and implementing controls for monitoring of memory and remediation processes for poisoned memory.

Source evidence

Original public records and references for this case.

View all sources →

Original source

Original source links

Open the MITRE ATLAS data and public references used for this case study.

Repositoryhttps://github.com/mitre-atlas/atlas-data ATLAS.yamlhttps://github.com/mitre-atlas/atlas-data/blob/main/dist/ATLAS.yaml Schemahttps://github.com/mitre-atlas/atlas-data/blob/main/dist/schemas/atlas_output_schema.json ChatGPT: Hacking Memories with Prompt Injectionhttps://embracethered.com/blog/posts/2024/chatgpt-hacking-memories/