Record summary
A quick snapshot of what this page covers.
Lifecycle stage
A group of defenses with the same label.
10 AI defenses are grouped under ML Model Evaluation.
- ML lifecycle stage
- ML Model Evaluation
- Mitigation count
- 10
Related defenses
Defenses included in this group.
Adversarial Input Detection
Detect and block adversarial inputs or atypical queries that deviate from known benign behavior, exhibit behavior patterns observed in previous attacks or that come from potentially malicious IPs. Incorporate adversarial detection algorithms into the AI system prior to the AI model.
Control Access to AI Models and Data at Rest
Establish access controls on internal model registries and limit internal access to production models. Limit access to training data only to approved users.
Deepfake Detection
Apply deepfake detection algorithms against any untrusted or user-provided data, especially in impactful applications such as biometric verification, to block generated content.
Detectors may use a combination of approaches, including:
- AI models trained to differentiate between real and deepfake content.
- Identifying common inconsistencies in deepfake content, such as unnatural facial movements, audio mismatches, or pixel-level artifacts.
- Biometrics analysis, such blinking, eye movements, and microexpressions.
Generative AI Guardrails
Guardrails are safety controls that are placed between a generative AI model and the output shared with the user to prevent undesired inputs and outputs. Guardrails can take the form of validators such as filters, rule-based logic, or regular expressions, as well as AI-based approaches, such as classifiers and utilizing LLMs, or named entity recognition (NER) to evaluate the safety of the prompt or response. Domain specific methods can be employed to reduce risks in a variety of areas such as etiquette, brand damage, jailbreaking, false information, code exploits, SQL injections, and data leakage.
Generative AI Guidelines
Guidelines are safety controls that are placed between user-provided input and a generative AI model to help direct the model to produce desired outputs and prevent undesired outputs.
Guidelines can be implemented as instructions appended to all user prompts or as part of the instructions in the system prompt. They can define the goal(s), role, and voice of the system, as well as outline safety and security parameters.
Generative AI Model Alignment
When training or fine-tuning a generative AI model it is important to utilize techniques that improve model alignment with safety, security, and content policies.
The fine-tuning process can potentially remove built-in safety mechanisms in a generative AI model, but utilizing techniques such as Supervised Fine-Tuning, Reinforcement Learning from Human Feedback or AI Feedback, and Targeted Safety Context Distillation can improve the safety and alignment of the model.
Input Restoration
Preprocess all inference data to nullify or reverse potential adversarial perturbations.
Passive AI Output Obfuscation
Decreasing the fidelity of model outputs provided to the end user can reduce an adversary's ability to extract information about the model and optimize attacks for the model.
User Training
Educate AI model developers to on AI supply chain risks and potentially malicious AI artifacts. Educate users on how to identify deepfakes and phishing attempts.
Validate AI Model
Validate that AI models perform as intended by testing for backdoor triggers, potential for data leakage, or adversarial influence. Monitor AI model for concept drift and training data drift, which may indicate data tampering and poisoning.
Source
Where this page information comes from.
Original source
Original source links
Open the public records and source datasets used for this page.