Poisoning

Record summary

A quick snapshot of what this page covers.

Techniques27Attack methods connected to this risk.

Mitigations21Defenses that may help with related attacks.

Domain2. Privacy & SecurityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

"Data Poisoning involves deliberately corrupting a model’s training dataset to introduce vulnerabilities, derail its learning process, or cause it to make incorrect predictions (Carlini et al., 2023). For example, the tool Nightshade is a data poisoning tool, which allows artists to add invisible changes to the pixels in their art before uploading online, to break any models that use it for training.9 Such attacks exploit the fact that most GenAI models are trained on publicly available datasets like images and videos scraped from the web, which malicious actors can easily compromise."

Domain2. Privacy & Security

Subdomain2.2 > AI system security vulnerabilities and attacks

Entity1 - Human

Intent1 - Intentional

Timing1 - Pre-deployment

CategoryMisuse tactics to compromise GenAI systems (Model integrity)

SubcategoryPoisoning

Related techniques

Attack methods connected to this risk.

Suggested mitigations

Defenses that may help with related attacks.

Source

Research source for this risk, when available.

Included resource

Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data

AuthorsMarchal & XuYear2024TypeJournal Article

DOIhttps://doi.org/10.48550/arXiv.2406.13843 URLhttps://arxiv.org/abs/2406.13843

Original source

MIT AI Risk Repository

Open the public repository used for AI risk records and taxonomy fields.

Repositoryhttps://airisk.mit.edu/

Record summary

Risk profile

Suggested mitigations

Control Access to AI Models and Data at Rest

Validate AI Model

Code Signing

Sanitize Training Data

Maintain AI Dataset Provenance

AI Telemetry Logging

Privileged AI Agent Permissions Configuration

Single-User AI Agent Permissions Configuration

AI Agent Tools Permissions Configuration

Human In-the-Loop for AI Agent Actions

Restrict AI Agent Tool Invocation on Untrusted Data

Segmentation of AI Agent Components

Input and Output Validation for AI Agent Components

Memory Hardening

Model Hardening

Use Ensemble Methods

Input Restoration

Adversarial Input Detection

AI Bill of Materials

Limit Model Artifact Release

Verify AI Artifacts

Source

Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data

MIT AI Risk Repository