Record summary
A quick snapshot of what this page covers.
Risk profile
How this risk is described and categorized.
"GPAI model developers might run capabilities evaluations to determine whether it has dangerous or dual-use capabilities, and then decide whether it is safe to deploy. Such capabilities evaluations can fail to demonstrate all the capabilities of a model. For example, evaluations may miss certain capabilities that are difficult to assess, prohibitively costly to verify, or obscured by the model’s tendency to refuse responses due to safety training, even if it possesses some of these capabilities."
Suggested mitigations
Defenses that may help with related attacks.
Limit Public Release of Information
Source
Research source for this risk, when available.
Included resource
Risk Sources and Risk Management Measures in Support of Standards for General-Purpose AI Systems
Original source
MIT AI Risk Repository
Open the public repository used for AI risk records and taxonomy fields.
