Record summary
A quick snapshot of what this page covers.
Risk profile
How this risk is described and categorized.
"As we think about even more intelligent and advanced AI assistants, perhaps outperforming humans on many cognitive tasks, the question of how humans can successfully control such an assistant looms large. To achieve the goals we set for an assistant, it is possible (Shah, 2022) that the AI assistant will implement some form of consequentialist reasoning: considering many different plans, predicting their consequences and executing the plan that does best according to some metric, M. This kind of reasoning can arise because it is a broadly useful capability (e.g. planning ahead, considering more options and choosing the one which may perform better at a wide variety of tasks) and generally selected for, to the extent that doing well on M leads to an ML model 59 The Ethics of Advanced AI Assistants achieving good performance on its training objective, O, if M and O are correlated during training. In reality, an AI system may not fully implement exact consequentialist reasoning (it may use other heuristics, rules, etc.), but it may be a useful approximation to describe its behaviour on certain tasks. However, some amount of consequentialist reasoning can be dangerous when the assistant uses a metric M that is resource-unbounded (with significantly more resources, such as power, money and energy, you can score significantly higher on M) and misaligned – where M differs a lot from how humans would evaluate the outcome (i.e. it is not what users or society require). In the assistant case, this could be because it fails to benefit the user, when the user asks, in the way they expected to be benefitted – or because it acts in ways that overstep certain bounds and cause harm to non-users (see Chapter 5)."
Suggested mitigations
Defenses that may help with related attacks.
Source
Research source for this risk, when available.
Included resource
The Ethics of Advanced AI Assistants
Original source
MIT AI Risk Repository
Open the public repository used for AI risk records and taxonomy fields.
