Morris II Worm: RAG-Based Attack - AI Case Study

AI Case Study

Researchers developed Morris II, a zero-click worm designed to attack generative AI (GenAI) ecosystems and propagate between connected GenAI systems. The worm uses an adversarial self-replicating prompt which uses prompt injection to replicate the prompt as output and perform malicious activity. The researchers demonstrate how this worm can propagate through an email system with a RAG-based assistant. They use a t...

Overview

Case steps7Steps described in the case record.

Techniques7Attack methods mentioned in the case steps.

Linked CVEs0Known vulnerabilities mentioned in the record.

Risk patterns

Patterns found in the case record and its linked vulnerabilities.

1Dominant ATLAS tactic. Execution appears in 3 case steps.
2Multiple attack methods. The case connects to 7 unique AI attack methods.

Procedure timeline

Search the case steps or filter them by attacker goal.

Execution3AI Model Access1Persistence1Exfiltration1Impact1

Step 1
AI Model Inference API Access
AI Model Access

The researchers use access to the publicly available GenAI model API that powers the target RAG-based email system.
Step 2
Direct
Execution

The researchers test prompts on public model APIs to identify working prompt injections.
Step 3
AI Agent Tool Invocation
Execution

The researchers send an email containing an adversarial self-replicating prompt, or "AI worm," to an address used in the target email system. The GenAI email assistant automatically ingests the email as part of its normal operations to generate a suggested reply. The email is stored in the database used for retrieval augmented generation, compromising the RAG system.
Step 4
Triggered
Execution

When the email containing the worm is retrieved by the email assistant in another reply generation task, the prompt injection changes the behavior of the GenAI email assistant.
Step 5
LLM Prompt Self-Replication
Persistence

The self-replicating portion of the prompt causes the generated output to contain the malicious prompt, allowing the worm to propagate.
Step 6
LLM Data Leakage
Exfiltration

The malicious instructions in the prompt cause the generated output to leak sensitive data such as emails, addresses, and phone numbers.
Step 7
User Harm
Impact

Users of the GenAI email assistant may have PII leaked to attackers.

Mitigations

Defenses connected to the attack methods in this case.

Top 10 of 13View all mitigations →

AI Agent Tools Permissions Configuration

When deploying tools that will be shared across multiple AI agents, it is important to implement robust policies and controls on permissions for the tools. These controls include applying the principle of least privilege along with delegated access, where the tools receive the permissions, identities, and restrictions of the AI agent calling them. These configurations may be implemented either in MCP servers which connect the agents to the tools calling them or, in more complex cases, directly in the configuration files of the tool.

AI Telemetry Logging

Implement logging of inputs and outputs of deployed AI models. When deploying AI agents, implement logging of the intermediate steps of agentic actions and decisions, data access and tool use, installation commands, and identity of the agent. Monitoring logs can help to detect security threats and mitigate impacts.

Additionally, having logging enabled can discourage adversaries who want to remain undetected from utilizing AI resources.

Control Access to AI Models and Data in Production

Require users to verify their identities before accessing a production model. Require authentication for API endpoints and monitor production model queries to ensure compliance with usage policies and to prevent model misuse.

Generative AI Guardrails

Guardrails are safety controls that are placed between a generative AI model and the output shared with the user to prevent undesired inputs and outputs. Guardrails can take the form of validators such as filters, rule-based logic, or regular expressions, as well as AI-based approaches, such as classifiers and utilizing LLMs, or named entity recognition (NER) to evaluate the safety of the prompt or response. Domain specific methods can be employed to reduce risks in a variety of areas such as etiquette, brand damage, jailbreaking, false information, code exploits, SQL injections, and data leakage.

Showing 4 of 10

Source evidence

Original public records and references for this case.

View all sources →

Original source

Original source links

Open the MITRE ATLAS data and public references used for this case study.

Repositoryhttps://github.com/mitre-atlas/atlas-data ATLAS.yamlhttps://github.com/mitre-atlas/atlas-data/blob/main/dist/ATLAS.yaml Schemahttps://github.com/mitre-atlas/atlas-data/blob/main/dist/schemas/atlas_output_schema.json Here Comes The AI Worm: Unleashing Zero-click Worms that Target GenAI-Powered Applicationshttps://arxiv.org/abs/2403.02817