PromptRiskDBThreat intelligence atlas
AI Risk

Prompt Attacks

carefully controlled adversarial perturbation can flip a GPT model’s answer when used to classify text inputs. Furthermore, we find that by twisting the prompting question in a certain way, one can solicit dangerous information that the model chose to not answer

AI Risk2. Privacy & Security2.2 > AI system security vulnerabilities and attacks3 - Other

Record summary

A quick snapshot of what this page covers.

Techniques5Attack methods connected to this risk.
Mitigations1Defenses that may help with related attacks.
Domain2. Privacy & SecurityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

Domain2. Privacy & Security
Subdomain2.2 > AI system security vulnerabilities and attacks
Entity1 - Human
Intent1 - Intentional
Timing3 - Other
CategoryRobustness
SubcategoryPrompt Attacks

Suggested mitigations

Defenses that may help with related attacks.

Memory Hardening

ML Model EngineeringDeployment+1 more
LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Source

Research source for this risk, when available.