Attacking LLMs via Additional Modalities a

Record summary

A quick snapshot of what this page covers.

Techniques41Attack methods connected to this risk.

Mitigations19Defenses that may help with related attacks.

Domain2. Privacy & SecurityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

"LLMs can now process modalities other than text, e.g. images or video frames (OpenAI, 2023c; Gemini Team, 2023). Several studies show that gradient-based attacks on multimodal models are easy and effective (Carlini et al., 2023a; Bailey et al., 2023; Qi et al., 2023b). These attacks manipulate images that are input to the model (via an appropriate encoding). GPT-4Vision (OpenAI, 2023c) is vulnerable to jailbreaks and exfiltration attacks through much simpler means as well, e.g. writing jailbreaking text in the image (Willison, 2023a; Gong et al., 2023). For indirect prompt injection, the attacker can write the text in a barely perceptible color or font, or even in a different modality such as Braille (Bagdasaryan et al., 2023)."

Domain2. Privacy & Security

Subdomain2.2 > AI system security vulnerabilities and attacks

Entity1 - Human

Intent1 - Intentional

Timing2 - Post-deployment

CategoryJailbreaks and Prompt Injections Threaten Security of LLMs

SubcategoryAttacking LLMs via Additional Modalities a

Related techniques

Attack methods connected to this risk.

Suggested mitigations

Defenses that may help with related attacks.

Source

Research source for this risk, when available.

Included resource

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

AuthorsAnwar et al.Year2024TypePreprint

DOI10.48550/arXiv.2404.09932 URLhttps://arxiv.org/abs/2404.09932

Original source

MIT AI Risk Repository

Open the public repository used for AI risk records and taxonomy fields.

Repositoryhttps://airisk.mit.edu/

Attacking LLMs via Additional Modalities a

Record summary

Risk profile

Suggested mitigations

Memory Hardening

AI Telemetry Logging

Input and Output Validation for AI Agent Components

Generative AI Guardrails

Generative AI Guidelines

Generative AI Model Alignment

Control Access to AI Models and Data in Production

Limit Model Artifact Release

Control Access to AI Models and Data at Rest

Encrypt Sensitive Information

AI Model Distribution Methods

Privileged AI Agent Permissions Configuration

Single-User AI Agent Permissions Configuration

AI Agent Tools Permissions Configuration

Human In-the-Loop for AI Agent Actions

Restrict AI Agent Tool Invocation on Untrusted Data

Segmentation of AI Agent Components

Restrict Number of AI Model Queries

Code Signing

Source

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

MIT AI Risk Repository