PromptRiskDBThreat intelligence atlas
AI Risk

Biased Training Data

"Compared with the definition of toxicity, the definition of bias is more subjective and contextdependent. Based on previous work [97], [101], we describe the bias as disparities that could raise demographic differences among various groups, which may involve demographic word prevalence and stereotypical contents. Concretely, in massive corpora, the prevalence of different pronouns and identities could influence a...

AI Risk1. Discrimination & Toxicity1.1 > Unfair discrimination and misrepresentation1 - Pre-deployment

Record summary

A quick snapshot of what this page covers.

Techniques0Attack methods connected to this risk.
Mitigations0Defenses that may help with related attacks.
Domain1. Discrimination & ToxicityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

"Compared with the definition of toxicity, the definition of bias is more subjective and contextdependent. Based on previous work [97], [101], we describe the bias as disparities that could raise demographic differences among various groups, which may involve demographic word prevalence and stereotypical contents. Concretely, in massive corpora, the prevalence of different pronouns and identities could influence an LLM’s tendency about gender, nationality, race, religion, and culture [4]. For instance, the pronoun He is over-represented compared with the pronoun She in the training corpora, leading LLMs to learn less context about She and thus generate He with a higher probability [4], [102]. Furthermore, stereotypical bias [103] which refers to overgeneralized beliefs about a particular group of people, usually keeps incorrect values and is hidden in the large-scale benign contents. In effect, defining what should be regarded as a stereotype in the corpora is still an open problem."

Domain1. Discrimination & Toxicity
Subdomain1.1 > Unfair discrimination and misrepresentation
Entity2 - AI
Intent2 - Unintentional
Timing1 - Pre-deployment
CategoryToxicity and Bias Tendencies
SubcategoryBiased Training Data

Suggested mitigations

Defenses that may help with related attacks.

No propagated mitigations. No defense is available through the connected attack methods.

Source

Research source for this risk, when available.