Adversarial AI (General)

Record summary

A quick snapshot of what this page covers.

Techniques37Attack methods connected to this risk.

Mitigations32Defenses that may help with related attacks.

Domain2. Privacy & SecurityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

"Adversarial AI refers to a class of attacks that exploit vulnerabilities in machine-learning (ML) models. This class of misuse exploits vulnerabilities introduced by the AI assistant itself and is a form of misuse that can enable malicious entities to exploit privacy vulnerabilities and evade the model’s built-in safety mechanisms, policies, and ethical boundaries of the model. Besides the risks of misuse for offensive cyber operations, advanced AI assistants may also represent a new target for abuse, where bad actors exploit the AI systems themselves and use them to cause harm. While our understanding of vulnerabilities in frontier AI models is still an open research problem, commercial firms and researchers have already documented attacks that exploit vulnerabilities that are unique to AI and involve evasion, data poisoning, model replication, and exploiting traditional software flaws to deceive, manipulate, compromise, and render AI systems ineffective. This threat is related to, but distinct from, traditional cyber activities. Unlike traditional cyberattacks that typically are caused by ‘bugs’ or human mistakes in code, adversarial AI attacks are enabled by inherent vulnerabilities in the underlying AI algorithms and how they integrate into existing software ecosystems."

Domain2. Privacy & Security

Subdomain2.2 > AI system security vulnerabilities and attacks

Entity3 - Other

Intent1 - Intentional

Timing2 - Post-deployment

CategoryMalicious Uses

SubcategoryAdversarial AI (General)

Related techniques

Attack methods connected to this risk.

Suggested mitigations

Defenses that may help with related attacks.

Source

Research source for this risk, when available.

Included resource

The Ethics of Advanced AI Assistants

AuthorsGabriel et al.Year2024TypePreprint

DOI10.48550/arXiv.2404.16244 URLhttps://doi.org/10.48550/arXiv.2404.16244

Original source

MIT AI Risk Repository

Open the public repository used for AI risk records and taxonomy fields.

Repositoryhttps://airisk.mit.edu/

Adversarial AI (General)

Record summary

Risk profile

Suggested mitigations

AI Telemetry Logging

Privileged AI Agent Permissions Configuration

Single-User AI Agent Permissions Configuration

AI Agent Tools Permissions Configuration

Human In-the-Loop for AI Agent Actions

Restrict AI Agent Tool Invocation on Untrusted Data

Segmentation of AI Agent Components

Input and Output Validation for AI Agent Components

Limit Model Artifact Release

Control Access to AI Models and Data at Rest

Sanitize Training Data

Validate AI Model

AI Bill of Materials

Maintain AI Dataset Provenance

Verify AI Artifacts

Code Signing

Memory Hardening

Model Hardening

Use Ensemble Methods

Input Restoration

Adversarial Input Detection

Use Multi-Modal Sensors

Deepfake Detection

Generative AI Guardrails

Generative AI Guidelines

Generative AI Model Alignment

Restrict Library Loading

Vulnerability Scanning

User Training

Passive AI Output Obfuscation

Restrict Number of AI Model Queries

Control Access to AI Models and Data in Production

Source

The Ethics of Advanced AI Assistants

MIT AI Risk Repository