Record summary
A quick snapshot of what this page covers.
Attack context
How this AI attack works in practice.
Adversaries may prompt large language models and identify hallucinated entities. They may request software packages, commands, URLs, organization names, or e-mail addresses, and identify hallucinations with no connected real-world source. Discovered hallucinations provide the adversary with potential targets to Publish Hallucinated Entities. Different LLMs have been shown to produce the same hallucinations, so the hallucinations exploited by an adversary may affect users of other LLMs.
- ATLAS ID
- AML.T0062
- Priority score
- 42
Mitigations
Defenses that may help against this attack.
AML.M0020 - Generative AI Guardrails
Guardrails can help block hallucinated content that appears in model output.
AML.M0021 - Generative AI Guidelines
Guidelines can instruct the model to avoid producing hallucinated content.
AML.M0022 - Generative AI Model Alignment
Model alignment can help steer the model away from hallucinated content.
AML.M0004 - Restrict Number of AI Model Queries
Restricting number of model queries limits or slows an adversary's ability to identify possible hallucinations.
Case studies
Examples from public reports and exercises.
ChatGPT Package Hallucination
Researchers identified that large language models such as ChatGPT can hallucinate fake software package names that are not published to a package repository. An attacker could publish a malicious package under the hallucinated name to a package repository. Then users of the same or similar large language models may encounter the same hallucination and ultimately download and execute the malicious package leading to a variety of potential harms.
Source
Where this page information comes from.
Original source
Original source links
Open the public records and source datasets used for this page.