Record summary
A quick snapshot of what this page covers.
Risk profile
How this risk is described and categorized.
"...humans interacting with conversational agents may come to think of these agents as human-like. Anthropomorphising LMs may inflate users’ estimates of the conversational agent’s competencies...As a result, they may place undue confidence, trust, or expectations in these agents...This can result in different risks of harm, for example when human users rely on conversational agents in domains where this may cause knock-on harms, such as requesting psychotherapy...Anthropomorphisation may amplify risks of users yielding effective control by coming to trust conversational agents “blindly”. Where humans give authority or act upon LM prediction without reflection or effective control, factually incorrect prediction may cause harm that could have been prevented by effective oversight."
Suggested mitigations
Defenses that may help with related attacks.
Passive AI Output Obfuscation
Model Hardening
Restrict Number of AI Model Queries
Use Ensemble Methods
Validate AI Model
Input Restoration
Adversarial Input Detection
Control Access to AI Models and Data in Production
Restrict Library Loading
Code Signing
Verify AI Artifacts
Vulnerability Scanning
User Training
AI Bill of Materials
Source
Research source for this risk, when available.
Included resource
Ethical and social risks of harm from language models
Original source
MIT AI Risk Repository
Open the public repository used for AI risk records and taxonomy fields.
