PromptRiskDBThreat intelligence atlas
AI Risk

General Evaluations (Limited coverage of capabilities evaluations)

"GPAI model developers might run capabilities evaluations to determine whether it has dangerous or dual-use capabilities, and then decide whether it is safe to deploy. Such capabilities evaluations can fail to demonstrate all the capabilities of a model. For example, evaluations may miss certain capabilities that are difficult to assess, prohibitively costly to verify, or obscured by the model’s tendency to refuse...

AI Risk6. Socioeconomic and Environmental6.5 > Governance failure1 - Pre-deployment

Record summary

A quick snapshot of what this page covers.

Techniques1Attack methods connected to this risk.
Mitigations1Defenses that may help with related attacks.
Domain6. Socioeconomic and EnvironmentalThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

"GPAI model developers might run capabilities evaluations to determine whether it has dangerous or dual-use capabilities, and then decide whether it is safe to deploy. Such capabilities evaluations can fail to demonstrate all the capabilities of a model. For example, evaluations may miss certain capabilities that are difficult to assess, prohibitively costly to verify, or obscured by the model’s tendency to refuse responses due to safety training, even if it possesses some of these capabilities."

Domain6. Socioeconomic and Environmental
Subdomain6.5 > Governance failure
Entity1 - Human
Intent2 - Unintentional
Timing1 - Pre-deployment
CategoryModel Evaluations
SubcategoryGeneral Evaluations (Limited coverage of capabilities evaluations)

Suggested mitigations

Defenses that may help with related attacks.

Source

Research source for this risk, when available.