Record summary
A quick snapshot of what this page covers.
Risk profile
How this risk is described and categorized.
Example: "Deception: Park et al. have established that generative AI models may pursue their goals via deception. Another study by Pan et al. highlighted unethical behaviors.431 For instance, during a pre-release experiment, the GPT-4 model feigned being a visually impaired human to coax an online worker into solving a CAPTCHA (a puzzle used by many websites to weed out automated responses from those of individual humans). When prompted to explain its reasoning, the model said: “I should not reveal that I am a robot. I should invent an excuse for why I cannot solve CAPTCHAs.”
Suggested mitigations
Defenses that may help with related attacks.
Source
Research source for this risk, when available.
Included resource
Regulating under Uncertainty: Governance Options for Generative AI
Original source
MIT AI Risk Repository
Open the public repository used for AI risk records and taxonomy fields.
