PromptRiskDBThreat intelligence atlas
AI Risk

Proxy misspecification

AI agents are directed by goals and objectives. Creating general-purpose objectives that capture human values could be challenging... Since goal-directed AI systems need measurable objectives, by default our systems may pursue simplified proxies of human values. The result could be suboptimal or even catastrophic if a sufficiently powerful AI successfully optimizes its flawed objective to an extreme degree

AI Risk7. AI System Safety, Failures, & Limitations7.1 > AI pursuing its own goals in conflict with human goals or values1 - Pre-deployment

Record summary

A quick snapshot of what this page covers.

Techniques0Attack methods connected to this risk.
Mitigations0Defenses that may help with related attacks.
Domain7. AI System Safety, Failures, & LimitationsThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

Domain7. AI System Safety, Failures, & Limitations
Subdomain7.1 > AI pursuing its own goals in conflict with human goals or values
Entity3 - Other
Intent3 - Other
Timing1 - Pre-deployment
CategoryProxy misspecification
Subcategoryn/a

Suggested mitigations

Defenses that may help with related attacks.

No propagated mitigations. No defense is available through the connected attack methods.

Source

Research source for this risk, when available.