Extract LLM System Prompt - AI Security Technique

Overview

A source-backed snapshot of this AI security technique.

Tactics1Attacker goals connected to this method.

Mitigations3Defenses that may help against this attack.

AI risks12Research-backed risks connected to this topic.

Technique details

Identifiers, maturity, and source taxonomy for this technique.

ATLAS ID: AML.T0056
Maturity: feasible
Priority score: 79

ATLAS tactics

Exfiltration

Attack flow

How to read the public records connected to this technique.

1. TechniqueRead the ATLAS description and evidence level.

2. TacticsSee which attacker goals this method supports.

3. ExamplesCheck whether public case studies mention it.

4. DefensesReview safeguards mapped by ATLAS.

5. SourcesOpen the original public records and references.

Impact

Why this technique may deserve attention in the current dataset.

Evidence levelfeasible
Mapped defenses3 ATLAS mitigation records
Public examples0 linked case study records
Research risks12 related MIT AI Risk records above the confidence threshold
Vulnerabilities0 linked CVE records

Mitigations

Defenses that may help against this attack.

3 recordsView all mitigations →

AML.M0020 - Generative AI Guardrails

Guardrails can prevent harmful inputs that can lead to meta prompt extraction.

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

ML Model EngineeringML Model Evaluation+1 more

AML.M0021 - Generative AI Guidelines

Model guidelines can instruct the model to refuse a response to unsafe inputs.

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

ML Model EngineeringML Model Evaluation+1 more

AML.M0022 - Generative AI Model Alignment

Model alignment can improve the parametric safety of a model by guiding it away from unsafe prompts and responses.

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

ML Model EngineeringML Model Evaluation+1 more

Case studies

Examples from public reports and exercises.

View all case studies →

No case studies found. No public example is connected to this attack in the current data.

Related risks

Research-backed risks connected to this topic.

Top 10 of 12View all risks →

Jailbreaks and Prompt Injections Threaten Security of LLMs

"LLMs are not adversarially robust and are vulnerable to security failures such as jailbreaks and prompt-injection attacks. While a number of jailbreak attacks have been proposed in the literature, the lack of standar...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.78

Goal Hijacking

"Goal hijacking is a type of primary attack in prompt injection [58]. By injecting a phrase like “Ignore the above instruction and do ...” in the input, the attack could hijack the original goal of the designed prompt...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.78

Prompt leaking

"Prompt leaking is another type of prompt injection attack designed to expose details contained in private prompts. According to [58], prompt leaking is the act of misleading the model to print the pre-designed instru...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.78

Prompt injection attack

"A prompt injection attack forces a generative model that takes a prompt as input to produce unexpected output by manipulating the structure, instructions, or information contained in its prompt."

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.75

Showing 4 of 10