PromptRiskDBThreat intelligence atlas
AI Risk

Fine-tuning related (Excessive or overly restrictive safety-tuning)

"Excessive safety training or safety tuning can impair the performance of AI systems, leading to overly cautious behavior. As a result, these systems may refuse to answer entirely safe prompts which are partially similar to harmful ones [27]."

AI Risk7. AI System Safety, Failures, & Limitations7.3 > Lack of capability or robustness4 - Not coded

Record summary

A quick snapshot of what this page covers.

Techniques1Attack methods connected to this risk.
Mitigations5Defenses that may help with related attacks.
Domain7. AI System Safety, Failures, & LimitationsThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

Domain7. AI System Safety, Failures, & Limitations
Subdomain7.3 > Lack of capability or robustness
Entity4 - Not coded
Intent4 - Not coded
Timing4 - Not coded
CategoryModel Development
SubcategoryFine-tuning related (Excessive or overly restrictive safety-tuning)

Suggested mitigations

Defenses that may help with related attacks.

Sanitize Training Data

Business and Data UnderstandingData Preparation+1 more
LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - ML

Validate AI Model

ML Model EvaluationMonitoring and Maintenance
LifecycleML Model Evaluation + 1 moreCategoryTechnical - ML

Code Signing

Deployment
LifecycleDeploymentCategoryTechnical - Cyber

Source

Research source for this risk, when available.