APromptRiskDBThreat intelligence atlas
AI Security Technique

LLM Data Leakage - AI Security Technique

Adversaries may craft prompts that induce the LLM to leak sensitive information. This can include private user data or proprietary information. The leaked information may come from proprietary training data, data sources the LLM is connected to, or information from other users of the LLM.

AI Security TechniquedemonstratedExfiltration

Record summary

A quick snapshot of what this page covers.

Tactics1Attacker goals connected to this method.
Mitigations4Defenses that may help against this attack.
AI risks0Research-backed risks connected to this topic.

Attack context

How this AI attack works in practice.

ATLAS ID
AML.T0057
Priority score
42
Maturity: demonstrated
Exfiltration

Mitigations

Defenses that may help against this attack.

AML.M0020 - Generative AI Guardrails

ML Model EngineeringML Model Evaluation+1 more
LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Guardrails can detect sensitive data and PII in model outputs.

AML.M0021 - Generative AI Guidelines

ML Model EngineeringML Model Evaluation+1 more
LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Model guidelines can instruct the model to refuse a response to unsafe inputs.

AML.M0022 - Generative AI Model Alignment

ML Model EngineeringML Model Evaluation+1 more
LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Model alignment can improve the parametric safety of a model by guiding it away from unsafe prompts and responses.

AML.M0008 - Validate AI Model

ML Model EvaluationMonitoring and Maintenance
LifecycleML Model Evaluation + 1 moreCategoryTechnical - ML

Robust evaluation of an AI model can be used to detect privacy concerns, data leakage, and potential for revealing sensitive information.

Case studies

Examples from public reports and exercises.

Morris II Worm: RAG-Based Attack

exercise
Date2024-03-05

Researchers developed Morris II, a zero-click worm designed to attack generative AI (GenAI) ecosystems and propagate between connected GenAI systems. The worm uses an adversarial self-replicating prompt which uses prompt injection to replicate the prompt as output and perform malicious activity. The researchers demonstrate how this worm can propagate through an email system with a RAG-based assistant. They use a target system that automatically ingests received emails, retrieves past correspondences, and generates a reply for the user. To carry out the attack, they send a malicious email containing the adversarial self-replicating prompt, which ends up in the RAG database. The malicious instructions in the prompt tell the assistant to include sensitive user data in the response. Future requests to the email assistant may retrieve the malicious email. This leads to propagation of the worm due to the self-replicating portion of the prompt, as well as leaking private information due to the malicious instructions.

Source

Where this page information comes from.