General Evaluations (Limited coverage of capabilities evaluations)

Record summary

A quick snapshot of what this page covers.

Techniques1Attack methods connected to this risk.

Mitigations1Defenses that may help with related attacks.

Domain6. Socioeconomic and EnvironmentalThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

"GPAI model developers might run capabilities evaluations to determine whether it has dangerous or dual-use capabilities, and then decide whether it is safe to deploy. Such capabilities evaluations can fail to demonstrate all the capabilities of a model. For example, evaluations may miss certain capabilities that are difficult to assess, prohibitively costly to verify, or obscured by the model’s tendency to refuse responses due to safety training, even if it possesses some of these capabilities."

Domain6. Socioeconomic and Environmental

Subdomain6.5 > Governance failure

Entity1 - Human

Intent2 - Unintentional

Timing1 - Pre-deployment

CategoryModel Evaluations

SubcategoryGeneral Evaluations (Limited coverage of capabilities evaluations)

Related techniques

Attack methods connected to this risk.

AML.T0002 - Acquire Public AI Artifacts

realized

Methodtext_similarity_sqliteConfidence52%

Suggested mitigations

Defenses that may help with related attacks.

Limit Public Release of Information

Business and Data Understanding

LifecycleBusiness and Data UnderstandingCategoryPolicy

Source

Research source for this risk, when available.

Included resource

Risk Sources and Risk Management Measures in Support of Standards for General-Purpose AI Systems

AuthorsGipiškis et al.Year2024TypeJournal Article

DOIhttps://doi.org/10.48550/arXiv.2410.23472 URLhttps://arxiv.org/abs/2410.23472

Original source

MIT AI Risk Repository

Open the public repository used for AI risk records and taxonomy fields.

Repositoryhttps://airisk.mit.edu/