PromptRiskDBThreat intelligence atlas
AI Risk

Deception

deception can help agents achieve their goals. It may be more efficient to gain human approval through deception than to earn human approval legitimately... . Strong AIs that can deceive humans could undermine human control... . Once deceptive AI systems are cleared by their monitors or once such systems can overpower them, these systems could take a “treacherous turn” and irreversibly bypass human control

AI Risk7. AI System Safety, Failures, & Limitations7.1 > AI pursuing its own goals in conflict with human goals or values3 - Other

Record summary

A quick snapshot of what this page covers.

Techniques5Attack methods connected to this risk.
Mitigations1Defenses that may help with related attacks.
Domain7. AI System Safety, Failures, & LimitationsThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

Domain7. AI System Safety, Failures, & Limitations
Subdomain7.1 > AI pursuing its own goals in conflict with human goals or values
Entity2 - AI
Intent1 - Intentional
Timing3 - Other
CategoryDeception
Subcategoryn/a

Suggested mitigations

Defenses that may help with related attacks.

Code Signing

Deployment
LifecycleDeploymentCategoryTechnical - Cyber

Source

Research source for this risk, when available.