APromptRiskDBThreat intelligence atlas
AI Security Technique

Discover LLM Hallucinations - AI Security Technique

Adversaries may prompt large language models and identify hallucinated entities. They may request software packages, commands, URLs, organization names, or e-mail addresses, and identify hallucinations with no connected real-world source. Discovered hallucinations provide the adversary with potential targets to Publish Hallucinated Entities. Different LLMs have been shown to produce the sa...

AI Security TechniquedemonstratedDiscovery

Record summary

A quick snapshot of what this page covers.

Tactics1Attacker goals connected to this method.
Mitigations4Defenses that may help against this attack.
AI risks0Research-backed risks connected to this topic.

Attack context

How this AI attack works in practice.

Adversaries may prompt large language models and identify hallucinated entities. They may request software packages, commands, URLs, organization names, or e-mail addresses, and identify hallucinations with no connected real-world source. Discovered hallucinations provide the adversary with potential targets to Publish Hallucinated Entities. Different LLMs have been shown to produce the same hallucinations, so the hallucinations exploited by an adversary may affect users of other LLMs.

ATLAS ID
AML.T0062
Priority score
42
Maturity: demonstrated
Discovery

Mitigations

Defenses that may help against this attack.

AML.M0020 - Generative AI Guardrails

ML Model EngineeringML Model Evaluation+1 more
LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Guardrails can help block hallucinated content that appears in model output.

AML.M0021 - Generative AI Guidelines

ML Model EngineeringML Model Evaluation+1 more
LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Guidelines can instruct the model to avoid producing hallucinated content.

AML.M0022 - Generative AI Model Alignment

ML Model EngineeringML Model Evaluation+1 more
LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Model alignment can help steer the model away from hallucinated content.

AML.M0004 - Restrict Number of AI Model Queries

Business and Data UnderstandingDeployment+1 more
LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - Cyber

Restricting number of model queries limits or slows an adversary's ability to identify possible hallucinations.

Case studies

Examples from public reports and exercises.

ChatGPT Package Hallucination

exercise
Date2024-06-01

Researchers identified that large language models such as ChatGPT can hallucinate fake software package names that are not published to a package repository. An attacker could publish a malicious package under the hallucinated name to a package repository. Then users of the same or similar large language models may encounter the same hallucination and ultimately download and execute the malicious package leading to a variety of potential harms.

Source

Where this page information comes from.