APromptRiskDBThreat intelligence atlas
ML Lifecycle Stage

Data Preparation AI Mitigations

Data Preparation groups 13 AI defenses across the ML lifecycle.

ML Lifecycle StageData Preparation

Record summary

A quick snapshot of what this page covers.

Records13Records included in this view.
SourcePublicBuilt from public source data.
ModeStaticPrepared as a ready-to-read page.

Lifecycle stage

A group of defenses with the same label.

13 AI defenses are grouped under Data Preparation.

ML lifecycle stage
Data Preparation
Mitigation count
13
Data Preparation

Related defenses

Defenses included in this group.

AI Bill of Materials

Business and Data UnderstandingData Preparation+1 more
LifecycleBusiness and Data Understanding + 2 moreCategoryPolicy

An AI Bill of Materials (AI BOM) contains a full listing of artifacts and resources that were used in building the AI. The AI BOM can help mitigate supply chain risks and enable rapid response to reported vulnerabilities.

This can include maintaining dataset provenance, i.e. a detailed history of datasets used for AI applications. The history can include information about the dataset source as well as well as a complete record of any modifications.

Adversarial Input Detection

Data PreparationML Model Engineering+3 more
LifecycleData Preparation + 4 moreCategoryTechnical - ML

Detect and block adversarial inputs or atypical queries that deviate from known benign behavior, exhibit behavior patterns observed in previous attacks or that come from potentially malicious IPs. Incorporate adversarial detection algorithms into the AI system prior to the AI model.

Control Access to AI Models and Data at Rest

Business and Data UnderstandingData Preparation+2 more
LifecycleBusiness and Data Understanding + 3 moreCategoryPolicy

Establish access controls on internal model registries and limit internal access to production models. Limit access to training data only to approved users.

Encrypt Sensitive Information

Data PreparationML Model Engineering+1 more
LifecycleData Preparation + 2 moreCategoryTechnical - Cyber

Encrypt sensitive data such as AI models to protect against adversaries attempting to access sensitive data.

Input Restoration

Data PreparationML Model Evaluation+2 more
LifecycleData Preparation + 3 moreCategoryTechnical - ML

Preprocess all inference data to nullify or reverse potential adversarial perturbations.

Input and Output Validation for AI Agent Components

Business and Data UnderstandingData Preparation+1 more
LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - ML

Implement validation on inputs and outputs for the tools and data sources used by AI agents. Validation includes enforcing a common data format, schema validation, checks for sensitive or prohibited information leakage, and data sanitization to remove potential injections or unsafe code. Input and output validation can help prevent compromises from spreading in AI-enabled systems and can help secure the workflow when multiple components are chained together. Validation should be performed external to the AI agent.

Maintain AI Dataset Provenance

Data PreparationBusiness and Data Understanding
LifecycleData Preparation + 1 moreCategoryTechnical - ML

Maintain a detailed history of datasets used for AI applications. The history should include information about the dataset's source as well as a complete record of any modifications.

Model Hardening

Data PreparationML Model Engineering
LifecycleData Preparation + 1 moreCategoryTechnical - ML

Use techniques to make AI models robust to adversarial inputs such as adversarial training or network distillation.

Sanitize Training Data

Business and Data UnderstandingData Preparation+1 more
LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - ML

Detect and remove or remediate poisoned training data. Training data should be sanitized prior to model training and recurrently for an active learning model.

Implement a filter to limit ingested training data. Establish a content policy that would remove unwanted content such as certain explicit or offensive language from being used.

Use Multi-Modal Sensors

Business and Data UnderstandingData Preparation+1 more
LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - Cyber

Incorporate multiple sensors to integrate varying perspectives and modalities to avoid a single point of failure susceptible to physical attacks.

User Training

Business and Data UnderstandingData Preparation+4 more
LifecycleBusiness and Data Understanding + 5 moreCategoryPolicy

Educate AI model developers to on AI supply chain risks and potentially malicious AI artifacts. Educate users on how to identify deepfakes and phishing attempts.

Verify AI Artifacts

Business and Data UnderstandingData Preparation+1 more
LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - Cyber

Verify the cryptographic checksum of all AI artifacts to verify that the file was not modified by an attacker.

Vulnerability Scanning

ML Model EngineeringData Preparation
LifecycleML Model Engineering + 1 moreCategoryTechnical - Cyber

Vulnerability scanning is used to find potentially exploitable software vulnerabilities to remediate them.

File formats such as pickle files that are commonly used to store AI models can contain exploits that allow for arbitrary code execution. These files should be scanned for potentially unsafe calls, which could be used to execute code, create new processes, or establish networking capabilities. Adversaries may embed malicious code in model corrupt model files, so scanners should be capable of working with models that cannot be fully de-serialized. Model artifacts, downstream products produced by models, and external software dependencies should be scanned for known vulnerabilities.

Source

Where this page information comes from.