Generative AI Guardrails - AI Mitigation

AI Mitigation

Overview

A source-backed snapshot of this defense.

Guardrails are safety controls that are placed between a generative AI model and the output shared with the user to prevent undesired inputs and outputs. Guardrails can take the form of validators such as filters, rule-based logic, or regular expressions, as well as AI-based approaches, such as classifiers and utilizing LLMs, or named entity recognition (NER) to evaluate the safety of the prompt or response. Domain specific methods can be employed to reduce risks in a variety of areas such as etiquette, brand damage, jailbreaking, false information, code exploits, SQL injections, and data leakage.

Techniques8Attacks this defense is designed to help with.

Lifecycle3Where this defense applies in the AI lifecycle.

Categories1How the source groups this defense.

Safeguard details

Where this defense applies and how the source classifies it.

ATLAS ID: AML.M0020
Priority score: 40

ML Model EngineeringML Model EvaluationDeployment

Technical - ML

Covered techniques

Attacks this defense is designed to help with.

8 recordsView all techniques →

Showing 4 of 8

Source evidence

Original public records and references for this page.

View all sources →

Original source

Original source links

Open the public records and source datasets used for this page.

Repositoryhttps://github.com/mitre-atlas/atlas-data ATLAS.yamlhttps://github.com/mitre-atlas/atlas-data/blob/main/dist/ATLAS.yaml Schemahttps://github.com/mitre-atlas/atlas-data/blob/main/dist/schemas/atlas_output_schema.json

Generative AI Guardrails - AI Mitigation

Overview

Safeguard details

Covered techniques

AML.T0053 - AI Agent Tool Invocation

AML.T0010 - AI Supply Chain Compromise

AML.T0062 - Discover LLM Hallucinations

AML.T0056 - Extract LLM System Prompt

Source evidence

Original source links