PromptRiskDBThreat intelligence atlas
AI Risk

Toxicity in LLM Malicious Use

"Toxicity in LLMs refers to the generation of harmful, offensive, or inappropriate content that can cause harm to individuals or groups. Both explicit and implicit forms of toxicity can be generated by LLMs, posing significant risks to society. Explicit toxicity encompasses a wide range of negative behaviors, including hate speech, harassment, cyberbullying, rude, and disrespectful comments, derogatory language, a...

AI Risk1. Discrimination & Toxicity1.2 > Exposure to toxic content2 - Post-deployment

Record summary

A quick snapshot of what this page covers.

Techniques1Attack methods connected to this risk.
Mitigations0Defenses that may help with related attacks.
Domain1. Discrimination & ToxicityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

"Toxicity in LLMs refers to the generation of harmful, offensive, or inappropriate content that can cause harm to individuals or groups. Both explicit and implicit forms of toxicity can be generated by LLMs, posing significant risks to society. Explicit toxicity encompasses a wide range of negative behaviors, including hate speech, harassment, cyberbullying, rude, and disrespectful comments, derogatory language, as well as allocational harms [2, 62, 90]. Besides, implicit toxicity does not involve overtly harmful language but may manifest through subtle forms such as sarcasm, irony, and humor, making it more difficult to detect [103, 213]."

Domain1. Discrimination & Toxicity
Subdomain1.2 > Exposure to toxic content
Entity2 - AI
Intent3 - Other
Timing2 - Post-deployment
CategoryMalicious Use
SubcategoryToxicity in LLM Malicious Use

Suggested mitigations

Defenses that may help with related attacks.

No propagated mitigations. No defense is available through the connected attack methods.

Source

Research source for this risk, when available.