Record summary
A quick snapshot of what this page covers.
Risk profile
How this risk is described and categorized.
"Perpetuating harmful stereotypes and discrimination is a well-documented harm in machine learning models that represent natural language (Caliskan et al., 2017). LMs that encode discriminatory language or social stereotypes can cause different types of harm... Unfair discrimination manifests in differential treatment or access to resources among individuals or groups based on sensitive traits such as sex, religion, gender, sexual orientation, ability and age."
Suggested mitigations
Defenses that may help with related attacks.
Source
Research source for this risk, when available.
Included resource
Ethical and social risks of harm from language models
Original source
MIT AI Risk Repository
Open the public repository used for AI risk records and taxonomy fields.