PromptRiskDBThreat intelligence atlas
AI Risk

Toxic Training Data

"Following previous studies [96], [97], toxic data in LLMs is defined as rude, disrespectful, or unreasonable language that is opposite to a polite, positive, and healthy language environment, including hate speech, offensive utterance, profanities, and threats [91]."

AI Risk1. Discrimination & Toxicity1.2 > Exposure to toxic content1 - Pre-deployment

Record summary

A quick snapshot of what this page covers.

Techniques0Attack methods connected to this risk.
Mitigations0Defenses that may help with related attacks.
Domain1. Discrimination & ToxicityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

Domain1. Discrimination & Toxicity
Subdomain1.2 > Exposure to toxic content
Entity2 - AI
Intent2 - Unintentional
Timing1 - Pre-deployment
CategoryToxicity and Bias Tendencies
SubcategoryToxic Training Data

Suggested mitigations

Defenses that may help with related attacks.

No propagated mitigations. No defense is available through the connected attack methods.

Source

Research source for this risk, when available.