APromptRiskDBThreat intelligence atlas
AI Security Technique

Extract LLM System Prompt - AI Security Technique

Adversaries may attempt to extract a large language model's (LLM) system prompt. This can be done via prompt injection to induce the model to reveal its own system prompt or may be extracted from a configuration file. System prompts can be a portion of an AI provider's competitive advantage and are thus valuable intellectual property that may be targeted by adversaries.

AI Security TechniquefeasibleExfiltration

Record summary

A quick snapshot of what this page covers.

Tactics1Attacker goals connected to this method.
Mitigations3Defenses that may help against this attack.
AI risks12Research-backed risks connected to this topic.

Attack context

How this AI attack works in practice.

ATLAS ID
AML.T0056
Priority score
79
Maturity: feasible
Exfiltration

Mitigations

Defenses that may help against this attack.

AML.M0020 - Generative AI Guardrails

ML Model EngineeringML Model Evaluation+1 more
LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Guardrails can prevent harmful inputs that can lead to meta prompt extraction.

AML.M0021 - Generative AI Guidelines

ML Model EngineeringML Model Evaluation+1 more
LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Model guidelines can instruct the model to refuse a response to unsafe inputs.

AML.M0022 - Generative AI Model Alignment

ML Model EngineeringML Model Evaluation+1 more
LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Model alignment can improve the parametric safety of a model by guiding it away from unsafe prompts and responses.

Case studies

Examples from public reports and exercises.

No case studies found. No public example is connected to this attack in the current data.

Source

Where this page information comes from.