Fine-tuning related (Unexpected competence in fine-tuned versions of the upstream model)

Record summary

A quick snapshot of what this page covers.

Techniques1Attack methods connected to this risk.

Mitigations5Defenses that may help with related attacks.

Domain7. AI System Safety, Failures, & LimitationsThe broad risk area this belongs to.

How this risk is described and categorized.

Domain7. AI System Safety, Failures, & Limitations

Subdomain7.2 > AI possessing dangerous capabilities

Entity1 - Human

Intent2 - Unintentional

Timing1 - Pre-deployment

CategoryModel Development

SubcategoryFine-tuning related (Unexpected competence in fine-tuned versions of the upstream model)

Attack methods connected to this risk.

realized

Methodtext_similarity_sqliteConfidence53%

Defenses that may help with related attacks.

Business and Data UnderstandingData Preparation+2 more

LifecycleBusiness and Data Understanding + 3 moreCategoryPolicy

ML Model Engineering

LifecycleML Model EngineeringCategoryTechnical - ML

ML Model EvaluationMonitoring and Maintenance

LifecycleML Model Evaluation + 1 moreCategoryTechnical - ML

Deployment

LifecycleDeploymentCategoryTechnical - Cyber

Deployment

LifecycleDeploymentCategoryPolicy

Research source for this risk, when available.

Included resource

AuthorsGipiškis et al.Year2024TypeJournal Article

Original source

Open the public repository used for AI risk records and taxonomy fields.