Malicious Use and Unleashing AI Agents

Record summary

A quick snapshot of what this page covers.

Techniques1Attack methods connected to this risk.

Mitigations0Defenses that may help with related attacks.

Domain4. Malicious Actors & MisuseThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

LMs, due to their remarkable capabilities, carry the same potential for malice as other technological products. For instance, they may be used in information warfare to generate deceptive information or unlawful content, thereby having a significant impact on individuals and society. As current LMs are increasingly built as agents to accomplish user objectives, they may disregard the moral and safety guidelines if operating without adequate supervision. Instead, they may execute user commands mechanically without considering the potential damage. They might interact unpredictably with humans and other systems, especially in open environments

Domain4. Malicious Actors & Misuse

Subdomain4.0 > Malicious use

Entity3 - Other

Intent1 - Intentional

Timing2 - Post-deployment

CategoryMalicious Use and Unleashing AI Agents

Subcategoryn/a

Related techniques

Attack methods connected to this risk.

AML.T0100 - AI Agent Clickbait

demonstrated

Methodtext_similarity_sqliteConfidence56%

Suggested mitigations

Defenses that may help with related attacks.

No propagated mitigations. No defense is available through the connected attack methods.

Source

Research source for this risk, when available.

Included resource

Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements

AuthorsDeng et al.Year2023TypePreprint

DOI10.48550/arXiv.2302.09270 URLhttps://arxiv.org/abs/2302.09270

Original source

MIT AI Risk Repository

Open the public repository used for AI risk records and taxonomy fields.

Repositoryhttps://airisk.mit.edu/