Record summary
A quick snapshot of what this page covers.
Risk profile
How this risk is described and categorized.
"Even if we successfully control early AIs and direct them to promote human values, future AIs could end up with different goals that humans would not endorse. This process, termed “goal drift,” can be hard to predict or control. This section is most cutting-edge and the most speculative, and in it we will discuss how goals shift in various agents and groups and explore the possibility of this phenomenon occurring in AIs. We will also examine a mechanism that could lead to unexpected goal drift, called intrinsification, and discuss how goal drift in AIs could be catastrophic."
Suggested mitigations
Defenses that may help with related attacks.
Source
Research source for this risk, when available.
Included resource
An Overview of Catastrophic AI Risks
Original source
MIT AI Risk Repository
Open the public repository used for AI risk records and taxonomy fields.
