Novel Attacks on LLMs - PromptRiskDB

Record summary

A quick snapshot of what this page covers.

Techniques4Attack methods connected to this risk.

Mitigations3Defenses that may help with related attacks.

Domain2. Privacy & SecurityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

Domain2. Privacy & Security

Subdomain2.2 > AI system security vulnerabilities and attacks

Entity1 - Human

Intent1 - Intentional

Timing3 - Other

CategoryModel Attacks

SubcategoryNovel Attacks on LLMs

Related techniques

Attack methods connected to this risk.

AML.T0080.001 - Thread

demonstrated

Methodtaxonomy_keyword_ruleConfidence59%

AML.T0017 - Develop Capabilities

realized

Methodtaxonomy_keyword_ruleConfidence58%

AML.T0056 - Extract LLM System Prompt

feasible

Methodtaxonomy_keyword_ruleConfidence56%

AML.T0066 - Retrieval Content Crafting

demonstrated

Methodtaxonomy_keyword_ruleConfidence55%

Suggested mitigations

Defenses that may help with related attacks.

Generative AI Guardrails

ML Model EngineeringML Model Evaluation+1 more

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Generative AI Guidelines

ML Model EngineeringML Model Evaluation+1 more

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Generative AI Model Alignment

ML Model EngineeringML Model Evaluation+1 more

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

Source

Research source for this risk, when available.

Included resource

Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems

AuthorsCui et al.Year2024TypePreprint

DOI10.48550/arXiv.2401.05778 URLhttps://arxiv.org/abs/2401.05778

Original source

MIT AI Risk Repository

Open the public repository used for AI risk records and taxonomy fields.

Repositoryhttps://airisk.mit.edu/