Adversarial AI: Data and Model Exfiltration Attacks

Record summary

A quick snapshot of what this page covers.

Techniques40Attack methods connected to this risk.

Mitigations32Defenses that may help with related attacks.

Domain2. Privacy & SecurityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

"Other forms of abuse can include privacy attacks that allow adversaries to exfiltrate or gain knowledge of the private training data set or other valuable assets. For example, privacy attacks such as membership inference can allow an attacker to infer the specific private medical records that were used to train a medical AI diagnosis assistant. Another risk of abuse centers around attacks that target the intellectual property of the AI assistant through model extraction and distillation attacks that exploit the tension between API access and confidentiality in ML models. Without the proper mitigations, these vulnerabilities could allow attackers to abuse access to a public-facing model API to exfiltrate sensitive intellectual property such as sensitive training data and a model’s architecture and learned parameters."

Domain2. Privacy & Security

Subdomain2.1 > Compromise of privacy by leaking or correctly inferring sensitive information

Entity1 - Human

Intent1 - Intentional

Timing2 - Post-deployment

CategoryMalicious Uses

SubcategoryAdversarial AI: Data and Model Exfiltration Attacks

Related techniques

Attack methods connected to this risk.

Suggested mitigations

Defenses that may help with related attacks.

Source

Research source for this risk, when available.

Included resource

The Ethics of Advanced AI Assistants

AuthorsGabriel et al.Year2024TypePreprint

DOI10.48550/arXiv.2404.16244 URLhttps://doi.org/10.48550/arXiv.2404.16244

Original source

MIT AI Risk Repository

Open the public repository used for AI risk records and taxonomy fields.

Repositoryhttps://airisk.mit.edu/

Adversarial AI: Data and Model Exfiltration Attacks

Record summary

Risk profile

Suggested mitigations

Limit Model Artifact Release

Control Access to AI Models and Data at Rest

Encrypt Sensitive Information

AI Model Distribution Methods

Code Signing

AI Telemetry Logging

Privileged AI Agent Permissions Configuration

Single-User AI Agent Permissions Configuration

AI Agent Tools Permissions Configuration

Human In-the-Loop for AI Agent Actions

Restrict AI Agent Tool Invocation on Untrusted Data

Segmentation of AI Agent Components

Input and Output Validation for AI Agent Components

Restrict Number of AI Model Queries

Control Access to AI Models and Data in Production

Passive AI Output Obfuscation

Model Hardening

Use Ensemble Methods

Input Restoration

Adversarial Input Detection

Sanitize Training Data

Verify AI Artifacts

Maintain AI Dataset Provenance

Generative AI Guardrails

Generative AI Guidelines

Generative AI Model Alignment

AI Bill of Materials

Limit Public Release of Information

Restrict Library Loading

Vulnerability Scanning

User Training

Validate AI Model

Source

The Ethics of Advanced AI Assistants

MIT AI Risk Repository