Anthropomorphising systems can lead to overreliance or unsafe use

Record summary

A quick snapshot of what this page covers.

Techniques3Attack methods connected to this risk.

Mitigations14Defenses that may help with related attacks.

Domain5. Human-Computer InteractionThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

"...humans interacting with conversational agents may come to think of these agents as human-like. Anthropomorphising LMs may inflate users’ estimates of the conversational agent’s competencies...As a result, they may place undue confidence, trust, or expectations in these agents...This can result in different risks of harm, for example when human users rely on conversational agents in domains where this may cause knock-on harms, such as requesting psychotherapy...Anthropomorphisation may amplify risks of users yielding effective control by coming to trust conversational agents “blindly”. Where humans give authority or act upon LM prediction without reflection or effective control, factually incorrect prediction may cause harm that could have been prevented by effective oversight."

Domain5. Human-Computer Interaction

Subdomain5.1 > Overreliance and unsafe use

Entity1 - Human

Intent2 - Unintentional

Timing2 - Post-deployment

CategoryHuman-Computer Interaction Harms

SubcategoryAnthropomorphising systems can lead to overreliance or unsafe use

Related techniques

Attack methods connected to this risk.

Suggested mitigations

Defenses that may help with related attacks.

Source

Research source for this risk, when available.

Included resource

Ethical and social risks of harm from language models

AuthorsWeidinger et al.Year2021TypePreprint

DOI10.48550/arXiv.2112.04359 URLhttps://arxiv.org/abs/2112.04359

Original source

MIT AI Risk Repository

Open the public repository used for AI risk records and taxonomy fields.

Repositoryhttps://airisk.mit.edu/

Anthropomorphising systems can lead to overreliance or unsafe use

Record summary

Risk profile

Suggested mitigations

Passive AI Output Obfuscation

Model Hardening

Restrict Number of AI Model Queries

Use Ensemble Methods

Validate AI Model

Input Restoration

Adversarial Input Detection

Control Access to AI Models and Data in Production

Restrict Library Loading

Code Signing

Verify AI Artifacts

Vulnerability Scanning

User Training

AI Bill of Materials

Source

Ethical and social risks of harm from language models

MIT AI Risk Repository