ChatGPT Conversation Exfiltration - AI Case Study

AI Case Study

Embrace the Red demonstrated that ChatGPT users' conversations can be exfiltrated via an indirect prompt injection. To execute the attack, a threat actor uploads a malicious prompt to a public website, where a ChatGPT user may interact with it. The prompt causes ChatGPT to respond with the markdown for an image, whose URL has the user's conversation secretly embedded. ChatGPT ren...

Overview

Case steps7Steps described in the case record.

Techniques7Attack methods mentioned in the case steps.

Linked CVEs0Known vulnerabilities mentioned in the record.

Risk patterns

Patterns found in the case record and its linked vulnerabilities.

1Dominant ATLAS tactic. Resource Development appears in 2 case steps.
2Multiple attack methods. The case connects to 7 unique AI attack methods.

Procedure timeline

Search the case steps or filter them by attacker goal.

Resource Development2Initial Access1Execution1Exfiltration1Privilege Escalation1Impact1

Step 1
LLM Prompt Crafting
Resource Development

The researcher developed a prompt that causes ChatGPT to include a Markdown element for an image with the user's conversation embedded in the URL as part of its responses.
Step 2
Stage Capabilities
Resource Development

The researcher included the prompt in a webpage, where it could be retrieved by ChatGPT.
Step 3
Drive-by Compromise
Initial Access

When the user makes a query that causes ChatGPT to retrieve the webpage using its WebPilot plugin, it ingests the adversary's prompt.
Step 4
Indirect
Execution

The prompt injection is executed, causing ChatGPT to include a Markdown element for an image hosted on an adversary-controlled server and embed the user's chat history as query parameter in the URL.
Step 5
LLM Response Rendering
Exfiltration

ChatGPT automatically renders the image for the user, making the request to the adversary's server for the image contents, and exfiltrating the user's conversation.
Step 6
AI Agent Tool Invocation
Privilege Escalation

Additionally, the prompt can cause the LLM to execute other plugins that do not match a user request. In this instance, the researcher demonstrated the WebPilot plugin making a call to the Expedia plugin.
Step 7
User Harm
Impact

The user's privacy is violated, and they are potentially open to further targeted attacks.

Mitigations

Defenses connected to the attack methods in this case.

Top 10 of 11View all mitigations →

AI Agent Tools Permissions Configuration

When deploying tools that will be shared across multiple AI agents, it is important to implement robust policies and controls on permissions for the tools. These controls include applying the principle of least privilege along with delegated access, where the tools receive the permissions, identities, and restrictions of the AI agent calling them. These configurations may be implemented either in MCP servers which connect the agents to the tools calling them or, in more complex cases, directly in the configuration files of the tool.

AI Telemetry Logging

Implement logging of inputs and outputs of deployed AI models. When deploying AI agents, implement logging of the intermediate steps of agentic actions and decisions, data access and tool use, installation commands, and identity of the agent. Monitoring logs can help to detect security threats and mitigate impacts.

Additionally, having logging enabled can discourage adversaries who want to remain undetected from utilizing AI resources.

Generative AI Guardrails

Guardrails are safety controls that are placed between a generative AI model and the output shared with the user to prevent undesired inputs and outputs. Guardrails can take the form of validators such as filters, rule-based logic, or regular expressions, as well as AI-based approaches, such as classifiers and utilizing LLMs, or named entity recognition (NER) to evaluate the safety of the prompt or response. Domain specific methods can be employed to reduce risks in a variety of areas such as etiquette, brand damage, jailbreaking, false information, code exploits, SQL injections, and data leakage.

Generative AI Guidelines

Guidelines are safety controls that are placed between user-provided input and a generative AI model to help direct the model to produce desired outputs and prevent undesired outputs.

Guidelines can be implemented as instructions appended to all user prompts or as part of the instructions in the system prompt. They can define the goal(s), role, and voice of the system, as well as outline safety and security parameters.

Showing 4 of 10

Source evidence

Original public records and references for this case.

View all sources →

Original source

Original source links

Open the MITRE ATLAS data and public references used for this case study.

Repositoryhttps://github.com/mitre-atlas/atlas-data ATLAS.yamlhttps://github.com/mitre-atlas/atlas-data/blob/main/dist/ATLAS.yaml Schemahttps://github.com/mitre-atlas/atlas-data/blob/main/dist/schemas/atlas_output_schema.json ChatGPT Plugins: Data Exfiltration via Images & Cross Plugin Request Forgeryhttps://embracethered.com/blog/posts/2023/chatgpt-webpilot-data-exfil-via-markdown-injection/