Record summary
A quick snapshot of what this page covers.
Risk profile
How this risk is described and categorized.
"AI models can be poisoned during instruction tuning when models are tuned using pairs of instructions and desired outputs. Poisoning in instruction tuning can be achieved with a lower number of compromised samples, as instruction tuning requires a relatively small number of samples for fine-tuning [155, 211]. Anonymous crowdsourcing efforts may be employed in collecting instruction tuning datasets and can further contribute to poisoning attacks [187]. These attacks might be harder to detect than traditional data poisoning attacks."
Suggested mitigations
Defenses that may help with related attacks.
Control Access to AI Models and Data at Rest
Sanitize Training Data
Validate AI Model
Code Signing
Maintain AI Dataset Provenance
Memory Hardening
Verify AI Artifacts
AI Bill of Materials
Model Hardening
Use Ensemble Methods
Input Restoration
Adversarial Input Detection
AI Telemetry Logging
Privileged AI Agent Permissions Configuration
Single-User AI Agent Permissions Configuration
AI Agent Tools Permissions Configuration
Human In-the-Loop for AI Agent Actions
Restrict AI Agent Tool Invocation on Untrusted Data
Segmentation of AI Agent Components
Input and Output Validation for AI Agent Components
Limit Model Artifact Release
Encrypt Sensitive Information
AI Model Distribution Methods
Source
Research source for this risk, when available.
Included resource
Risk Sources and Risk Management Measures in Support of Standards for General-Purpose AI Systems
Original source
MIT AI Risk Repository
Open the public repository used for AI risk records and taxonomy fields.