Record summary
A quick snapshot of what this page covers.
Attack context
How this AI attack works in practice.
- ATLAS ID
- AML.T0061
- Priority score
- 159
Mitigations
Defenses that may help against this attack.
AML.M0020 - Generative AI Guardrails
Guardrails can help prevent replication attacks in model inputs and outputs.
AML.M0021 - Generative AI Guidelines
Guidelines can help instruct the model to produce more secure output, preventing the model from generating self-replicating outputs.
AML.M0022 - Generative AI Model Alignment
Model alignment can increase the security of models to self replicating prompt attacks.
Case studies
Examples from public reports and exercises.
Morris II Worm: RAG-Based Attack
Researchers developed Morris II, a zero-click worm designed to attack generative AI (GenAI) ecosystems and propagate between connected GenAI systems. The worm uses an adversarial self-replicating prompt which uses prompt injection to replicate the prompt as output and perform malicious activity. The researchers demonstrate how this worm can propagate through an email system with a RAG-based assistant. They use a target system that automatically ingests received emails, retrieves past correspondences, and generates a reply for the user. To carry out the attack, they send a malicious email containing the adversarial self-replicating prompt, which ends up in the RAG database. The malicious instructions in the prompt tell the assistant to include sensitive user data in the response. Future requests to the email assistant may retrieve the malicious email. This leads to propagation of the worm due to the self-replicating portion of the prompt, as well as leaking private information due to the malicious instructions.
Source
Where this page information comes from.
Original source
Original source links
Open the public records and source datasets used for this page.