AI Risk

Adversarial input

"Adversarial Inputs involve modifying individual input data to cause a model to malfunction. These modifications, which are often imperceptible to humans, exploit how the model makes decisions to produce errors (Wallace et al., 2019) and can be applied to text, but also to images, audio, or video (e.g. changing pixels in an image of a panda in a way that causes a model to label it as a gibbon).6"

View related techniques Read profile

AI Risk2. Privacy & Security2.2 > AI system security vulnerabilities and attacks2 - Post-deployment

Record summary

A quick snapshot of what this page covers.

Techniques8Attack methods connected to this risk.

Mitigations16Defenses that may help with related attacks.

Domain2. Privacy & SecurityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

Domain2. Privacy & Security

Subdomain2.2 > AI system security vulnerabilities and attacks

Entity1 - Human

Intent1 - Intentional

Timing2 - Post-deployment

CategoryMisuse tactics to compromise GenAI systems (Model integrity)

SubcategoryAdversarial input

Suggested mitigations

Defenses that may help with related attacks.

Use Multi-Modal Sensors

Business and Data UnderstandingData Preparation+1 more

LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - Cyber

Sanitize Training Data

Business and Data UnderstandingData Preparation+1 more

LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - ML

Maintain AI Dataset Provenance

Data PreparationBusiness and Data Understanding

LifecycleData Preparation + 1 moreCategoryTechnical - ML

AI Telemetry Logging

DeploymentMonitoring and Maintenance

LifecycleDeployment + 1 moreCategoryTechnical - Cyber

Privileged AI Agent Permissions Configuration

Deployment

LifecycleDeploymentCategoryTechnical - Cyber

Single-User AI Agent Permissions Configuration

Deployment

LifecycleDeploymentCategoryTechnical - Cyber

AI Agent Tools Permissions Configuration

Deployment

LifecycleDeploymentCategoryTechnical - Cyber

Human In-the-Loop for AI Agent Actions

Deployment

LifecycleDeploymentCategoryTechnical - ML

Restrict AI Agent Tool Invocation on Untrusted Data

Deployment

LifecycleDeploymentCategoryTechnical - ML

Segmentation of AI Agent Components

DeploymentBusiness and Data Understanding

LifecycleDeployment + 1 moreCategoryTechnical - Cyber

Input and Output Validation for AI Agent Components

Business and Data UnderstandingData Preparation+1 more

LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - ML

Model Hardening

Data PreparationML Model Engineering

LifecycleData Preparation + 1 moreCategoryTechnical - ML

Use Ensemble Methods

ML Model Engineering

LifecycleML Model EngineeringCategoryTechnical - ML

Validate AI Model

ML Model EvaluationMonitoring and Maintenance

LifecycleML Model Evaluation + 1 moreCategoryTechnical - ML

Input Restoration

Data PreparationML Model Evaluation+2 more

LifecycleData Preparation + 3 moreCategoryTechnical - ML

Adversarial Input Detection

Data PreparationML Model Engineering+3 more

LifecycleData Preparation + 4 moreCategoryTechnical - ML

Source

Research source for this risk, when available.

Included resource

Generative AI Misuse: A Taxonomy of Tactics and Insights from Real-World Data

AuthorsMarchal & XuYear2024TypeJournal Article

DOIhttps://doi.org/10.48550/arXiv.2406.13843 URLhttps://arxiv.org/abs/2406.13843

Original source

MIT AI Risk Repository

Open the public repository used for AI risk records and taxonomy fields.

Repositoryhttps://airisk.mit.edu/