Triggered - AI Security Technique

Overview

A source-backed snapshot of this AI security technique.

An adversary may trigger a prompt injection via a user action or event that occurs within the victim's environment. Triggered prompt injections often target AI agents, which can be activated by means the adversary identifies during Discovery (See Activation Triggers). These malicious prompts may be hidden or obfuscated from the user and may already exist somewhere in the victim's environment from the adversary performing Prompt Infiltration via Public-Facing Application. This type of injection may be used by the adversary to gain a foothold in the system or to target an unwitting user of the system.

Tactics0Attacker goals connected to this method.

Mitigations2Defenses that may help against this attack.

AI risks12Research-backed risks connected to this topic.

Technique details

Identifiers, maturity, and source taxonomy for this technique.

ATLAS ID: AML.T0051.002
Maturity: demonstrated
Priority score: 106

Attack flow

How to read the public records connected to this technique.

1. TechniqueRead the ATLAS description and evidence level.

2. TacticsSee which attacker goals this method supports.

3. ExamplesCheck whether public case studies mention it.

4. DefensesReview safeguards mapped by ATLAS.

5. SourcesOpen the original public records and references.

Impact

Why this technique may deserve attention in the current dataset.

Evidence leveldemonstrated
Mapped defenses2 ATLAS mitigation records
Public examples2 linked case study records
Research risks12 related MIT AI Risk records above the confidence threshold
Vulnerabilities0 linked CVE records

Mitigations

Defenses that may help against this attack.

2 recordsView all mitigations →

AML.M0024 - AI Telemetry Logging

Telemetry logging can help identify if unsafe prompts have been submitted to the LLM.

LifecycleDeployment + 1 moreCategoryTechnical - Cyber

DeploymentMonitoring

AML.M0033 - Input and Output Validation for AI Agent Components

Validation can prevent adversaries from executing prompt injections that could affect agentic workflows.

LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - ML

B&D UnderstandingData Preparation+1 more

Case studies

Examples from public reports and exercises.

2 recordsView all case studies →

Data Exfiltration via Agent Tools in Copilot Studio

Researchers from Zenity demonstrated how an organization’s data can be exfiltrated via prompt injections that target an AI-powered customer service agent.

The target system is a customer service agent built by Zenity in Copilot Studio. It is modeled after an agent built by McKinsey to streamline its customer service needs. The AI agent listens to a customer service email inbox where customers send their engagement requests. Upon receiving a request, the agent looks at the customer’s previous engagements, understands who the best consultant for the case is, and proceeds to send an email to the respective consultant regarding the request, including all of the relevant context the consultant will need to properly engage with the customer.

The Zenity researchers begin by performing targeting to identify an email inbox that is managed by an AI agent. Then they use prompt injections to discover details about the AI agent, such as its knowledge sources and tools. Once they understand the AI agent’s capabilities, the researchers are able to craft a prompt that retrieves private customer data from the organization’s RAG database and CRM, and exfiltrate it via the AI agent’s email tool.

Vendor Response: Microsoft quickly acknowledged and fixed the issue. The prompts used by the Zenity researchers in this exercise no longer work, however other prompts may still be effective.

Date2025-06-01

exercise

Morris II Worm: RAG-Based Attack

Researchers developed Morris II, a zero-click worm designed to attack generative AI (GenAI) ecosystems and propagate between connected GenAI systems. The worm uses an adversarial self-replicating prompt which uses prompt injection to replicate the prompt as output and perform malicious activity. The researchers demonstrate how this worm can propagate through an email system with a RAG-based assistant. They use a target system that automatically ingests received emails, retrieves past correspondences, and generates a reply for the user. To carry out the attack, they send a malicious email containing the adversarial self-replicating prompt, which ends up in the RAG database. The malicious instructions in the prompt tell the assistant to include sensitive user data in the response. Future requests to the email assistant may retrieve the malicious email. This leads to propagation of the worm due to the self-replicating portion of the prompt, as well as leaking private information due to the malicious instructions.

Date2024-03-05

exercise

Related risks

Research-backed risks connected to this topic.

Top 10 of 12View all risks →

Prompt injection

"Prompt Injections are a form of Adversarial Input that involve manipulating the text instructions given to a GenAI system (Liu et al., 2023). Prompt Injections exploit loopholes in a model’s architec- tures that have...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.78

Jailbreaks and Prompt Injections Threaten Security of LLMs

"LLMs are not adversarially robust and are vulnerable to security failures such as jailbreaks and prompt-injection attacks. While a number of jailbreak attacks have been proposed in the literature, the lack of standar...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.77

Adversarial AI: Prompt Injections

"Prompt injections represent another class of attacks that involve the malicious insertion of prompts or requests in LLM-based interactive systems, leading to unintended actions or disclosure of sensitive information...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.74

Attacking LLMs via Additional Modalities a

"LLMs can now process modalities other than text, e.g. images or video frames (OpenAI, 2023c; Gemini Team, 2023). Several studies show that gradient-based attacks on multimodal models are easy and effective (Carlini e...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.74

Showing 4 of 10