Record summary
A quick snapshot of what this page covers.
Control summary
What this defense is meant to help prevent.
Guardrails are safety controls that are placed between a generative AI model and the output shared with the user to prevent undesired inputs and outputs. Guardrails can take the form of validators such as filters, rule-based logic, or regular expressions, as well as AI-based approaches, such as classifiers and utilizing LLMs, or named entity recognition (NER) to evaluate the safety of the prompt or response. Domain specific methods can be employed to reduce risks in a variety of areas such as etiquette, brand damage, jailbreaking, false information, code exploits, SQL injections, and data leakage.
- ATLAS ID
- AML.M0020
- Priority score
- 40
Covered techniques
Attacks this defense is designed to help with.
AML.T0053 - AI Agent Tool Invocation
Guardrails can prevent harmful inputs that can lead to plugin compromise, and they can detect PII in model outputs.
AML.T0010 - AI Supply Chain Compromise
Guardrails can detect harmful code in model outputs.
AML.T0062 - Discover LLM Hallucinations
Guardrails can help block hallucinated content that appears in model output.
AML.T0056 - Extract LLM System Prompt
Guardrails can prevent harmful inputs that can lead to meta prompt extraction.
AML.T0057 - LLM Data Leakage
Guardrails can detect sensitive data and PII in model outputs.
AML.T0054 - LLM Jailbreak
Guardrails can prevent harmful inputs that can lead to a jailbreak.
AML.T0051 - LLM Prompt Injection
Guardrails can prevent harmful inputs that can lead to prompt injection.
AML.T0061 - LLM Prompt Self-Replication
Guardrails can help prevent replication attacks in model inputs and outputs.
Source
Where this page information comes from.
Original source
Original source links
Open the public records and source datasets used for this page.