Record summary
A quick snapshot of what this page covers.
Risk profile
How this risk is described and categorized.
"How do we get an AGI to work towards the right goals? MIRI calls this value specification. Bostrom (2014) discusses this problem at length, ar- guing that it is much harder than one might naively think. Davis (2015) criticizes Bostrom’s argument, and Bensinger (2015) defends Bostrom against Davis’ criticism. Reward corruption, reward gaming, and negative side effects are subproblems of value specification highlighted in the DeepMind and OpenAI agendas."
Suggested mitigations
Defenses that may help with related attacks.
Source
Research source for this risk, when available.
Included resource
AGI Safety Literature Review
Original source
MIT AI Risk Repository
Open the public repository used for AI risk records and taxonomy fields.
