PromptRiskDBThreat intelligence atlas
AI Risk

Fine-tuning related (Degrading safety training due to benign fine-tuning)

"When downstream providers of AI systems fine-tune AI models to be more suitable for their needs, the resulting AI model can be more likely to produce undesired or harmful outputs (as compared to the non-fine-tuned model), even if the fine-tuning was done with harmless and commonly used data [154]."

AI Risk7. AI System Safety, Failures, & Limitations7.0 > AI system safety, failures, & limitations2 - Post-deployment

Record summary

A quick snapshot of what this page covers.

Techniques0Attack methods connected to this risk.
Mitigations0Defenses that may help with related attacks.
Domain7. AI System Safety, Failures, & LimitationsThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

Domain7. AI System Safety, Failures, & Limitations
Subdomain7.0 > AI system safety, failures, & limitations
Entity1 - Human
Intent2 - Unintentional
Timing2 - Post-deployment
CategoryModel Development
SubcategoryFine-tuning related (Degrading safety training due to benign fine-tuning)

Suggested mitigations

Defenses that may help with related attacks.

No propagated mitigations. No defense is available through the connected attack methods.

Source

Research source for this risk, when available.