Goal Hijacking - PromptRiskDB

Record summary

A quick snapshot of what this page covers.

Techniques4Attack methods connected to this risk.

Mitigations4Defenses that may help with related attacks.

Domain2. Privacy & SecurityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

Domain2. Privacy & Security

Subdomain2.2 > AI system security vulnerabilities and attacks

Entity1 - Human

Intent1 - Intentional

Timing2 - Post-deployment

CategoryInstruction Attacks

SubcategoryGoal Hijacking

Related techniques

Attack methods connected to this risk.

AML.T0056 - Extract LLM System Prompt

feasible

Methodtaxonomy_keyword_ruleConfidence61%

AML.T0080.000 - Memory

demonstrated

Methodtaxonomy_keyword_ruleConfidence58%

AML.T0069.002 - System Prompt

demonstrated

Methodtext_similarity_sqliteConfidence57%

AML.T0080 - AI Agent Context Poisoning

demonstrated

Methodtaxonomy_keyword_ruleConfidence56%

Suggested mitigations

Defenses that may help with related attacks.

Generative AI Guardrails

ML Model EngineeringML Model Evaluation+1 more

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Generative AI Guidelines

ML Model EngineeringML Model Evaluation+1 more

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Generative AI Model Alignment

ML Model EngineeringML Model Evaluation+1 more

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Memory Hardening

ML Model EngineeringDeployment+1 more

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Source

Research source for this risk, when available.

Included resource

Safety Assessment of Chinese Large Language Models

AuthorsSun et al.Year2023TypePreprint

DOI10.48550/arXiv.2304.10436 URLhttps://arxiv.org/abs/2304.10436

Original source

MIT AI Risk Repository

Open the public repository used for AI risk records and taxonomy fields.

Repositoryhttps://airisk.mit.edu/