Record summary
A quick snapshot of what this page covers.
Lifecycle stage
A group of defenses with the same label.
10 AI defenses are grouped under Monitoring and Maintenance.
- ML lifecycle stage
- Monitoring and Maintenance
- Mitigation count
- 10
Related defenses
Defenses included in this group.
AI Telemetry Logging
Implement logging of inputs and outputs of deployed AI models. When deploying AI agents, implement logging of the intermediate steps of agentic actions and decisions, data access and tool use, installation commands, and identity of the agent. Monitoring logs can help to detect security threats and mitigate impacts.
Additionally, having logging enabled can discourage adversaries who want to remain undetected from utilizing AI resources.
Adversarial Input Detection
Detect and block adversarial inputs or atypical queries that deviate from known benign behavior, exhibit behavior patterns observed in previous attacks or that come from potentially malicious IPs. Incorporate adversarial detection algorithms into the AI system prior to the AI model.
Control Access to AI Models and Data in Production
Require users to verify their identities before accessing a production model. Require authentication for API endpoints and monitor production model queries to ensure compliance with usage policies and to prevent model misuse.
Deepfake Detection
Apply deepfake detection algorithms against any untrusted or user-provided data, especially in impactful applications such as biometric verification, to block generated content.
Detectors may use a combination of approaches, including:
- AI models trained to differentiate between real and deepfake content.
- Identifying common inconsistencies in deepfake content, such as unnatural facial movements, audio mismatches, or pixel-level artifacts.
- Biometrics analysis, such blinking, eye movements, and microexpressions.
Input Restoration
Preprocess all inference data to nullify or reverse potential adversarial perturbations.
Memory Hardening
Memory Hardening involves developing trust boundaries and secure processes for how an AI agent stores and accesses memory and context. This may be implemented using a combination of strategies including restricting an agent's ability to store memories by requiring external authentication and validation for memory updates, performing semantic integrity checks on retrieved memories before agents execute actions, and implementing controls for monitoring of memory and remediation processes for poisoned memory.
Restrict Number of AI Model Queries
Limit the total number and rate of queries a user can perform.
Sanitize Training Data
Detect and remove or remediate poisoned training data. Training data should be sanitized prior to model training and recurrently for an active learning model.
Implement a filter to limit ingested training data. Establish a content policy that would remove unwanted content such as certain explicit or offensive language from being used.
User Training
Educate AI model developers to on AI supply chain risks and potentially malicious AI artifacts. Educate users on how to identify deepfakes and phishing attempts.
Validate AI Model
Validate that AI models perform as intended by testing for backdoor triggers, potential for data leakage, or adversarial influence. Monitor AI model for concept drift and training data drift, which may indicate data tampering and poisoning.
Source
Where this page information comes from.
Original source
Original source links
Open the public records and source datasets used for this page.