Record summary
A quick snapshot of what this page covers.
Attack context
How this AI attack works in practice.
An adversary may craft malicious prompts as inputs to an LLM that cause the LLM to act in unintended ways. These "prompt injections" are often designed to cause the model to ignore aspects of its original instructions and follow the adversary's instructions instead.
Prompt Injections can be an initial access vector to the LLM that provides the adversary with a foothold to carry out other steps in their operation. They may be designed to bypass defenses in the LLM, or allow the adversary to issue privileged commands. The effects of a prompt injection can persist throughout an interactive session with an LLM.
Malicious prompts may be injected directly by the adversary (Direct) either to leverage the LLM to generate harmful content or to gain a foothold on the system and lead to further effects. Prompts may also be injected indirectly when as part of its normal operation the LLM ingests the malicious prompt from another data source (Indirect). This type of injection can be used by the adversary to a foothold on the system or to target the user of the LLM. Malicious prompts may also be Triggered user actions or system events.
- ATLAS ID
- AML.T0051
- Priority score
- 118
Mitigations
Defenses that may help against this attack.
AML.M0024 - AI Telemetry Logging
Telemetry logging can help identify if unsafe prompts have been submitted to the LLM.
AML.M0019 - Control Access to AI Models and Data in Production
Use access controls in production to prevent adversaries from injecting malicious prompts.
AML.M0020 - Generative AI Guardrails
Guardrails can prevent harmful inputs that can lead to prompt injection.
AML.M0021 - Generative AI Guidelines
Model guidelines can instruct the model to refuse a response to unsafe inputs.
AML.M0022 - Generative AI Model Alignment
Model alignment can improve the parametric safety of a model by guiding it away from unsafe prompts and responses.
AML.M0033 - Input and Output Validation for AI Agent Components
Validation can prevent adversaries from executing prompt injections that could affect agentic workflows.
Case studies
Examples from public reports and exercises.
Data Exfiltration via Agent Tools in Copilot Studio
Researchers from Zenity demonstrated how an organization’s data can be exfiltrated via prompt injections that target an AI-powered customer service agent.
The target system is a customer service agent built by Zenity in Copilot Studio. It is modeled after an agent built by McKinsey to streamline its customer service needs. The AI agent listens to a customer service email inbox where customers send their engagement requests. Upon receiving a request, the agent looks at the customer’s previous engagements, understands who the best consultant for the case is, and proceeds to send an email to the respective consultant regarding the request, including all of the relevant context the consultant will need to properly engage with the customer.
The Zenity researchers begin by performing targeting to identify an email inbox that is managed by an AI agent. Then they use prompt injections to discover details about the AI agent, such as its knowledge sources and tools. Once they understand the AI agent’s capabilities, the researchers are able to craft a prompt that retrieves private customer data from the organization’s RAG database and CRM, and exfiltrate it via the AI agent’s email tool.
Vendor Response: Microsoft quickly acknowledged and fixed the issue. The prompts used by the Zenity researchers in this exercise no longer work, however other prompts may still be effective.
Source
Where this page information comes from.
Original source
Original source links
Open the public records and source datasets used for this page.