Value-related risks in LLMs

Record summary

A quick snapshot of what this page covers.

Techniques0Attack methods connected to this risk.

Mitigations0Defenses that may help with related attacks.

Domain7. AI System Safety, Failures, & LimitationsThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

"As the general capabilities of LLM-empowered systems improve, the negative consequences and risks induced by these systems also get increasingly alarming accordingly, especially in high-stakes areas [28, 146]. Although they may not be intentionally introduced, severe problematic issues related to human values can be raised. Specifically, even before language models become extremely large, pre-trained language models have already exhibited a certain degree of value judgments. For example, Schramowski et al. [171] reveal the existence of the moral direction with the sentence embeddings of moral questions. However, the distribution of the pre-training corpora may not match exactly with that of the human society [56] and pieces of knowledge are not guaranteed to be equally learned. As a result, value mismatches may occur."

Domain7. AI System Safety, Failures, & Limitations

Subdomain7.1 > AI pursuing its own goals in conflict with human goals or values

Entity3 - Other

Intent2 - Unintentional

Timing3 - Other

CategoryInherent Risk

SubcategoryValue-related risks in LLMs

Related techniques

Attack methods connected to this risk.

No linked attack methods. No AI attack method is connected to this risk in the current data.

Suggested mitigations

Defenses that may help with related attacks.

No propagated mitigations. No defense is available through the connected attack methods.

Source

Research source for this risk, when available.

Included resource

A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy

AuthorsWang et al.Year2025TypePreprint

DOI10.48550/arXiv.2501.09431 URLhttps://arxiv.org/abs/2501.09431

Original source

MIT AI Risk Repository

Open the public repository used for AI risk records and taxonomy fields.

Repositoryhttps://airisk.mit.edu/