Record summary
A quick snapshot of what this page covers.
Risk profile
How this risk is described and categorized.
"As well as optimizing a subtly wrong goal, systems can develop harmful instrumental goals in the service of a given goal—without these emergent goals being specied in any way [434, 218, 339, 17]. For instance, a theorem in reinforcement learning suggests that optimal and near-optimal policies will seek power over their environment under fairly general conditions [560]. This power-seeking behavior is plausibly the worst of these emergent goals [92], and may be an attractor state for highly capable systems, since most goals can be furthered through gaining resources, self-preservation, preventing goal modication, and blocking adversaries [426, 449]. Presently, power-seeking is not common, because most systems are unable to plan and understand how actions affect their power in the long term [414]."
Suggested mitigations
Defenses that may help with related attacks.
Source
Research source for this risk, when available.
Included resource
Ten Hard Problems in Artificial Intelligence We Must Get Right
Original source
MIT AI Risk Repository
Open the public repository used for AI risk records and taxonomy fields.
