Record summary
A quick snapshot of what this page covers.
Risk profile
How this risk is described and categorized.
"As we have seen, this could be due to the skill not being required during the training process (perhaps due to issues with the training data) or because the learnt skill was quite brittle and was not generalisable to a new situation (lack of robustness to distributional shift). In particular, advanced AI assistants may not have the capability to represent complex concepts that are pertinent to their own ethical impact, for example the concept of 'benefitting the user' or 'when the user asks' or representing 'the way in which a user expects to be benefitted'."
Suggested mitigations
Defenses that may help with related attacks.
Restrict Library Loading
Code Signing
Vulnerability Scanning
User Training
AI Bill of Materials
Source
Research source for this risk, when available.
Included resource
The Ethics of Advanced AI Assistants
Original source
MIT AI Risk Repository
Open the public repository used for AI risk records and taxonomy fields.
