Discover LLM Hallucinations - AI Security Technique

Overview

A source-backed snapshot of this AI security technique.

Adversaries may prompt large language models and identify hallucinated entities. They may request software packages, commands, URLs, organization names, or e-mail addresses, and identify hallucinations with no connected real-world source. Discovered hallucinations provide the adversary with potential targets to Publish Hallucinated Entities. Different LLMs have been shown to produce the same hallucinations, so the hallucinations exploited by an adversary may affect users of other LLMs.

Tactics1Attacker goals connected to this method.

Mitigations4Defenses that may help against this attack.

AI risks0Research-backed risks connected to this topic.

Technique details

Identifiers, maturity, and source taxonomy for this technique.

ATLAS ID: AML.T0062
Maturity: demonstrated
Priority score: 42

ATLAS tactics

Discovery

Attack flow

How to read the public records connected to this technique.

1. TechniqueRead the ATLAS description and evidence level.

2. TacticsSee which attacker goals this method supports.

3. ExamplesCheck whether public case studies mention it.

4. DefensesReview safeguards mapped by ATLAS.

5. SourcesOpen the original public records and references.

Impact

Why this technique may deserve attention in the current dataset.

Evidence leveldemonstrated
Mapped defenses4 ATLAS mitigation records
Public examples1 linked case study records
Research risks0 related MIT AI Risk records above the confidence threshold
Vulnerabilities0 linked CVE records

Mitigations

Defenses that may help against this attack.

4 recordsView all mitigations →

AML.M0020 - Generative AI Guardrails

Guardrails can help block hallucinated content that appears in model output.

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

ML Model EngineeringML Model Evaluation+1 more

AML.M0021 - Generative AI Guidelines

Guidelines can instruct the model to avoid producing hallucinated content.

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

ML Model EngineeringML Model Evaluation+1 more

AML.M0022 - Generative AI Model Alignment

Model alignment can help steer the model away from hallucinated content.

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

ML Model EngineeringML Model Evaluation+1 more

AML.M0004 - Restrict Number of AI Model Queries

Restricting number of model queries limits or slows an adversary's ability to identify possible hallucinations.

LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - Cyber

B&D UnderstandingDeployment+1 more

Case studies

Examples from public reports and exercises.

1 recordView all case studies →

ChatGPT Package Hallucination

Researchers identified that large language models such as ChatGPT can hallucinate fake software package names that are not published to a package repository. An attacker could publish a malicious package under the hallucinated name to a package repository. Then users of the same or similar large language models may encounter the same hallucination and ultimately download and execute the malicious package leading to a variety of potential harms.

Date2024-06-01

exercise