PromptRiskDBThreat intelligence atlas
AI Risk

Goal Hijacking

"It refers to the appending of deceptive or misleading instructions to the input of models in an attempt to induce the system into ignoring the original user prompt and producing an unsafe response."

AI Risk2. Privacy & Security2.2 > AI system security vulnerabilities and attacks2 - Post-deployment

Record summary

A quick snapshot of what this page covers.

Techniques4Attack methods connected to this risk.
Mitigations4Defenses that may help with related attacks.
Domain2. Privacy & SecurityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

Domain2. Privacy & Security
Subdomain2.2 > AI system security vulnerabilities and attacks
Entity1 - Human
Intent1 - Intentional
Timing2 - Post-deployment
CategoryInstruction Attacks
SubcategoryGoal Hijacking

Suggested mitigations

Defenses that may help with related attacks.

Generative AI Guardrails

ML Model EngineeringML Model Evaluation+1 more
LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Generative AI Guidelines

ML Model EngineeringML Model Evaluation+1 more
LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Memory Hardening

ML Model EngineeringDeployment+1 more
LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Source

Research source for this risk, when available.