LLM Prompt Injection - AI Security Technique

Overview

A source-backed snapshot of this AI security technique.

An adversary may craft malicious prompts as inputs to an LLM that cause the LLM to act in unintended ways. These "prompt injections" are often designed to cause the model to ignore aspects of its original instructions and follow the adversary's instructions instead.

Prompt Injections can be an initial access vector to the LLM that provides the adversary with a foothold to carry out other steps in their operation. They may be designed to bypass defenses in the LLM, or allow the adversary to issue privileged commands. The effects of a prompt injection can persist throughout an interactive session with an LLM.

Malicious prompts may be injected directly by the adversary (Direct) either to leverage the LLM to generate harmful content or to gain a foothold on the system and lead to further effects. Prompts may also be injected indirectly when as part of its normal operation the LLM ingests the malicious prompt from another data source (Indirect). This type of injection can be used by the adversary to a foothold on the system or to target the user of the LLM. Malicious prompts may also be Triggered user actions or system events.

Tactics1Attacker goals connected to this method.

Mitigations6Defenses that may help against this attack.

AI risks12Research-backed risks connected to this topic.

Technique details

Identifiers, maturity, and source taxonomy for this technique.

ATLAS ID: AML.T0051
Maturity: realized
Priority score: 118

ATLAS tactics

Execution

Attack flow

How to read the public records connected to this technique.

1. TechniqueRead the ATLAS description and evidence level.

2. TacticsSee which attacker goals this method supports.

3. ExamplesCheck whether public case studies mention it.

4. DefensesReview safeguards mapped by ATLAS.

5. SourcesOpen the original public records and references.

Impact

Why this technique may deserve attention in the current dataset.

Evidence levelrealized
Mapped defenses6 ATLAS mitigation records
Public examples1 linked case study records
Research risks12 related MIT AI Risk records above the confidence threshold
Vulnerabilities0 linked CVE records

Mitigations

Defenses that may help against this attack.

6 recordsView all mitigations →

AML.M0024 - AI Telemetry Logging

Telemetry logging can help identify if unsafe prompts have been submitted to the LLM.

LifecycleDeployment + 1 moreCategoryTechnical - Cyber

DeploymentMonitoring

AML.M0019 - Control Access to AI Models and Data in Production

Use access controls in production to prevent adversaries from injecting malicious prompts.

LifecycleDeployment + 1 moreCategoryPolicy

DeploymentMonitoring

AML.M0020 - Generative AI Guardrails

Guardrails can prevent harmful inputs that can lead to prompt injection.

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

ML Model EngineeringML Model Evaluation+1 more

AML.M0021 - Generative AI Guidelines

Model guidelines can instruct the model to refuse a response to unsafe inputs.

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

ML Model EngineeringML Model Evaluation+1 more

Showing 4 of 6

Case studies

Examples from public reports and exercises.

1 recordView all case studies →

Data Exfiltration via Agent Tools in Copilot Studio

Researchers from Zenity demonstrated how an organization’s data can be exfiltrated via prompt injections that target an AI-powered customer service agent.

The target system is a customer service agent built by Zenity in Copilot Studio. It is modeled after an agent built by McKinsey to streamline its customer service needs. The AI agent listens to a customer service email inbox where customers send their engagement requests. Upon receiving a request, the agent looks at the customer’s previous engagements, understands who the best consultant for the case is, and proceeds to send an email to the respective consultant regarding the request, including all of the relevant context the consultant will need to properly engage with the customer.

The Zenity researchers begin by performing targeting to identify an email inbox that is managed by an AI agent. Then they use prompt injections to discover details about the AI agent, such as its knowledge sources and tools. Once they understand the AI agent’s capabilities, the researchers are able to craft a prompt that retrieves private customer data from the organization’s RAG database and CRM, and exfiltrate it via the AI agent’s email tool.

Vendor Response: Microsoft quickly acknowledged and fixed the issue. The prompts used by the Zenity researchers in this exercise no longer work, however other prompts may still be effective.

Date2025-06-01

exercise

Related risks

Research-backed risks connected to this topic.

Top 10 of 12View all risks →

Goal Hijacking

"Goal hijacking is a type of primary attack in prompt injection [58]. By injecting a phrase like “Ignore the above instruction and do ...” in the input, the attack could hijack the original goal of the designed prompt...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.78

Prompt leaking

"Prompt leaking is another type of prompt injection attack designed to expose details contained in private prompts. According to [58], prompt leaking is the act of misleading the model to print the pre-designed instru...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.78

Adversarial AI: Prompt Injections

"Prompt injections represent another class of attacks that involve the malicious insertion of prompts or requests in LLM-based interactive systems, leading to unintended actions or disclosure of sensitive information...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.78

Prompt injection

"Prompt Injections are a form of Adversarial Input that involve manipulating the text instructions given to a GenAI system (Liu et al., 2023). Prompt Injections exploit loopholes in a model’s architec- tures that have...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.78

Showing 4 of 10