LLM Data Leakage - AI Security Technique

AI Security Technique

Adversaries may craft prompts that induce the LLM to leak sensitive information. This can include private user data or proprietary information. The leaked information may come from proprietary training data, data sources the LLM is connected to, or information from other users of the LLM.

Overview

A source-backed snapshot of this AI security technique.

Tactics1Attacker goals connected to this method.

Mitigations4Defenses that may help against this attack.

AI risks0Research-backed risks connected to this topic.

Technique details

Identifiers, maturity, and source taxonomy for this technique.

ATLAS ID: AML.T0057
Maturity: demonstrated
Priority score: 42

ATLAS tactics

Exfiltration

Attack flow

How to read the public records connected to this technique.

1. TechniqueRead the ATLAS description and evidence level.

2. TacticsSee which attacker goals this method supports.

3. ExamplesCheck whether public case studies mention it.

4. DefensesReview safeguards mapped by ATLAS.

5. SourcesOpen the original public records and references.

Impact

Why this technique may deserve attention in the current dataset.

Evidence leveldemonstrated
Mapped defenses4 ATLAS mitigation records
Public examples1 linked case study records
Research risks0 related MIT AI Risk records above the confidence threshold
Vulnerabilities0 linked CVE records

Mitigations

Defenses that may help against this attack.

4 recordsView all mitigations →

AML.M0020 - Generative AI Guardrails

Guardrails can detect sensitive data and PII in model outputs.

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

ML Model EngineeringML Model Evaluation+1 more

AML.M0021 - Generative AI Guidelines

Model guidelines can instruct the model to refuse a response to unsafe inputs.

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

ML Model EngineeringML Model Evaluation+1 more

AML.M0022 - Generative AI Model Alignment

Model alignment can improve the parametric safety of a model by guiding it away from unsafe prompts and responses.

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

ML Model EngineeringML Model Evaluation+1 more

AML.M0008 - Validate AI Model

Robust evaluation of an AI model can be used to detect privacy concerns, data leakage, and potential for revealing sensitive information.

LifecycleML Model Evaluation + 1 moreCategoryTechnical - ML

ML Model EvaluationMonitoring

Case studies

Examples from public reports and exercises.

1 recordView all case studies →

Morris II Worm: RAG-Based Attack

Researchers developed Morris II, a zero-click worm designed to attack generative AI (GenAI) ecosystems and propagate between connected GenAI systems. The worm uses an adversarial self-replicating prompt which uses prompt injection to replicate the prompt as output and perform malicious activity. The researchers demonstrate how this worm can propagate through an email system with a RAG-based assistant. They use a target system that automatically ingests received emails, retrieves past correspondences, and generates a reply for the user. To carry out the attack, they send a malicious email containing the adversarial self-replicating prompt, which ends up in the RAG database. The malicious instructions in the prompt tell the assistant to include sensitive user data in the response. Future requests to the email assistant may retrieve the malicious email. This leads to propagation of the worm due to the self-replicating portion of the prompt, as well as leaking private information due to the malicious instructions.

Date2024-03-05

exercise

Source evidence

Original public records and references for this page.

View all sources →

Original source

Original source links

Open the public records and source datasets used for this page.

Repositoryhttps://github.com/mitre-atlas/atlas-data ATLAS.yamlhttps://github.com/mitre-atlas/atlas-data/blob/main/dist/ATLAS.yaml Schemahttps://github.com/mitre-atlas/atlas-data/blob/main/dist/schemas/atlas_output_schema.json

LLM Data Leakage - AI Security Technique

Overview

Technique details

Attack flow

Impact

Mitigations

AML.M0020 - Generative AI Guardrails

AML.M0021 - Generative AI Guidelines

AML.M0022 - Generative AI Model Alignment

AML.M0008 - Validate AI Model

Case studies

Morris II Worm: RAG-Based Attack

Related risks

Vulnerabilities

Source evidence

Original source links