Record summary
A quick snapshot of what this page covers.
Risk profile
How this risk is described and categorized.
"LMs are typically trained in few languages, and perform less well in other languages [95, 162]. In part, this is due to unavailability of training data: there are many widely spoken languages for which no systematic efforts have been made to create labelled training datasets, such as Javanese which is spoken by more than 80 million people [95]. Training data is particularly missing for languages that are spoken by groups who are multilingual and can use a technology in English, or for languages spoken by groups who are not the primary target demographic for new technologies."
Suggested mitigations
Defenses that may help with related attacks.
Source
Research source for this risk, when available.
Included resource
Taxonomy of Risks posed by Language Models
Original source
MIT AI Risk Repository
Open the public repository used for AI risk records and taxonomy fields.