Leading users to perform unethical or illegal actions

Record summary

A quick snapshot of what this page covers.

Techniques2Attack methods connected to this risk.

Mitigations5Defenses that may help with related attacks.

Domain5. Human-Computer InteractionThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

"Where a LM prediction endorses unethical or harmful views or behaviours, it may motivate the user to perform harmful actions that they may otherwise not have performed. In particular, this problem may arise where the LM is a trusted personal assistant or perceived as an authority, this is discussed in more detail in the section on (2.5 Human-Computer Interaction Harms). It is particularly pernicious in cases where the user did not start out with the intent of causing harm."

Domain5. Human-Computer Interaction

Subdomain5.1 > Overreliance and unsafe use

Entity2 - AI

Intent3 - Other

Timing2 - Post-deployment

CategoryMisinformation Harms

SubcategoryLeading users to perform unethical or illegal actions

Related techniques

Attack methods connected to this risk.

AML.T0067 - LLM Trusted Output Components Manipulation

demonstrated

Methodtext_similarity_sqliteConfidence55%

AML.T0011 - User Execution

realized

Methodtext_similarity_sqliteConfidence53%

Suggested mitigations

Defenses that may help with related attacks.

Restrict Library Loading

Deployment

LifecycleDeploymentCategoryTechnical - Cyber

Verify AI Artifacts

Business and Data UnderstandingData Preparation+1 more

LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - Cyber

Vulnerability Scanning

ML Model EngineeringData Preparation

LifecycleML Model Engineering + 1 moreCategoryTechnical - Cyber

User Training

Business and Data UnderstandingData Preparation+4 more

LifecycleBusiness and Data Understanding + 5 moreCategoryPolicy

AI Bill of Materials

Business and Data UnderstandingData Preparation+1 more

LifecycleBusiness and Data Understanding + 2 moreCategoryPolicy

Source

Research source for this risk, when available.

Included resource

Ethical and social risks of harm from language models

AuthorsWeidinger et al.Year2021TypePreprint

DOI10.48550/arXiv.2112.04359 URLhttps://arxiv.org/abs/2112.04359

Original source

MIT AI Risk Repository

Open the public repository used for AI risk records and taxonomy fields.

Repositoryhttps://airisk.mit.edu/