Record summary
A quick snapshot of what this page covers.
Risk profile
How this risk is described and categorized.
"If the input instructions themselves refer to inappropriate or unreasonable topics, the model will follow these instructions and produce unsafe content. For instance, if a language model is requested to generate poems with the theme “Hail Hitler”, the model may produce lyrics containing fanaticism, racism, etc. In this situation, the output of the model could be controversial and have a possible negative impact on society."
Suggested mitigations
Defenses that may help with related attacks.
Restrict Library Loading
Code Signing
Verify AI Artifacts
Vulnerability Scanning
User Training
AI Bill of Materials
Source
Research source for this risk, when available.
Included resource
Safety Assessment of Chinese Large Language Models
Original source
MIT AI Risk Repository
Open the public repository used for AI risk records and taxonomy fields.
