ML Model Evaluation AI Mitigations

ML Lifecycle Stage

ML Model Evaluation groups 10 AI defenses across the ML lifecycle.

Overview

A group of defenses with the same label.

Records10Records included in this view.

SourcePublicBuilt from public source data.

ModeStaticPrepared as a ready-to-read page.

Lifecycle stage

How ATLAS labels this defense group.

ML lifecycle stage: ML Model Evaluation
Mitigation count: 10

ML Model Evaluation

Related defenses

Defenses included in this group.

10 recordsView all mitigations →

Adversarial Input Detection

Detect and block adversarial inputs or atypical queries that deviate from known benign behavior, exhibit behavior patterns observed in previous attacks or that come from potentially malicious IPs. Incorporate adversarial detection algorithms into the AI system prior to the AI model.

LifecycleData Preparation + 4 moreCategoryTechnical - ML

Data PreparationML Model Engineering+3 more

Control Access to AI Models and Data at Rest

Establish access controls on internal model registries and limit internal access to production models. Limit access to training data only to approved users.

LifecycleBusiness and Data Understanding + 3 moreCategoryPolicy

B&D UnderstandingData Preparation+2 more

Deepfake Detection

Apply deepfake detection algorithms against any untrusted or user-provided data, especially in impactful applications such as biometric verification, to block generated content.

Detectors may use a combination of approaches, including:

AI models trained to differentiate between real and deepfake content.
Identifying common inconsistencies in deepfake content, such as unnatural facial movements, audio mismatches, or pixel-level artifacts.
Biometrics analysis, such blinking, eye movements, and microexpressions.

LifecycleDeployment + 3 moreCategoryTechnical - ML

DeploymentMonitoring+2 more

Generative AI Guardrails

Guardrails are safety controls that are placed between a generative AI model and the output shared with the user to prevent undesired inputs and outputs. Guardrails can take the form of validators such as filters, rule-based logic, or regular expressions, as well as AI-based approaches, such as classifiers and utilizing LLMs, or named entity recognition (NER) to evaluate the safety of the prompt or response. Domain specific methods can be employed to reduce risks in a variety of areas such as etiquette, brand damage, jailbreaking, false information, code exploits, SQL injections, and data leakage.

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

ML Model EngineeringML Model Evaluation+1 more

Showing 4 of 10

Source evidence

Original public records and references for this page.

View all sources →

Original source

Original source links

Open the public records and source datasets used for this page.

Repositoryhttps://github.com/mitre-atlas/atlas-data ATLAS.yamlhttps://github.com/mitre-atlas/atlas-data/blob/main/dist/ATLAS.yaml Schemahttps://github.com/mitre-atlas/atlas-data/blob/main/dist/schemas/atlas_output_schema.json