APromptRiskDBThreat intelligence atlas
AI Mitigation

Generative AI Guardrails - AI Mitigation

Guardrails are safety controls that are placed between a generative AI model and the output shared with the user to prevent undesired inputs and outputs. Guardrails can take the form of validators such as filters, rule-based logic, or regular expressions, as well as AI-based approaches, such as classifiers and utilizing LLMs, or named entity recognition (NER) to evaluate the safety of the prompt or response. Domai...

AI MitigationML Model EngineeringML Model EvaluationDeploymentTechnical - ML

Record summary

A quick snapshot of what this page covers.

Techniques8Attacks this defense is designed to help with.
Lifecycle3Where this defense applies in the AI lifecycle.
Categories1How the source groups this defense.

Control summary

What this defense is meant to help prevent.

Guardrails are safety controls that are placed between a generative AI model and the output shared with the user to prevent undesired inputs and outputs. Guardrails can take the form of validators such as filters, rule-based logic, or regular expressions, as well as AI-based approaches, such as classifiers and utilizing LLMs, or named entity recognition (NER) to evaluate the safety of the prompt or response. Domain specific methods can be employed to reduce risks in a variety of areas such as etiquette, brand damage, jailbreaking, false information, code exploits, SQL injections, and data leakage.

ATLAS ID
AML.M0020
Priority score
40
ML Model EngineeringML Model EvaluationDeployment
Technical - ML

Covered techniques

Attacks this defense is designed to help with.

Source

Where this page information comes from.