Record summary
A quick snapshot of what this page covers.
Risk profile
How this risk is described and categorized.
"Ability to comprehensively acquire, process and apply meta-information about its own system architecture, modifiable internal processes, and external operating environment, achieving deep understanding of its own state and environmental conditions, thereby conducting efficient environmental adaptation and risk avoidance. Critically, this capability could undermine the efficiency of human testing by enabling AIs to notice when they're being tested and responding accordingly."
Suggested mitigations
Defenses that may help with related attacks.
Source
Research source for this risk, when available.
Included resource
Frontier AI Risk Management Framework (v1.0)
Original source
MIT AI Risk Repository
Open the public repository used for AI risk records and taxonomy fields.