AI Agent Context Poisoning - AI Security Technique

Overview

A source-backed snapshot of this AI security technique.

Adversaries may attempt to manipulate the context used by an AI agent's large language model (LLM) to influence the responses it generates or actions it takes. This allows an adversary to persistently change the behavior of the target agent and further their goals.

Context poisoning can be accomplished by prompting the an LLM to add instructions or preferences to memory (See Memory) or by simply prompting an LLM that uses prior messages in a thread as part of its context (See Thread).

Tactics1Attacker goals connected to this method.

Mitigations1Defenses that may help against this attack.

AI risks13Research-backed risks connected to this topic.

Technique details

Identifiers, maturity, and source taxonomy for this technique.

ATLAS ID: AML.T0080
Maturity: demonstrated
Priority score: 88

ATLAS tactics

Persistence

Attack flow

How to read the public records connected to this technique.

1. TechniqueRead the ATLAS description and evidence level.

2. TacticsSee which attacker goals this method supports.

3. ExamplesCheck whether public case studies mention it.

4. DefensesReview safeguards mapped by ATLAS.

5. SourcesOpen the original public records and references.

Impact

Why this technique may deserve attention in the current dataset.

Evidence leveldemonstrated
Mapped defenses1 ATLAS mitigation records
Public examples0 linked case study records
Research risks13 related MIT AI Risk records above the confidence threshold
Vulnerabilities0 linked CVE records

Mitigations

Defenses that may help against this attack.

1 recordView all mitigations →

AML.M0031 - Memory Hardening

Memory hardening can help protect LLM memory from manipulation and prevent poisoned memories from executing.

LifecycleML Model Engineering + 2 moreCategoryTechnical - ML

ML Model EngineeringDeployment+1 more

Case studies

Examples from public reports and exercises.

View all case studies →

No case studies found. No public example is connected to this attack in the current data.

Related risks

Research-backed risks connected to this topic.

Top 10 of 13View all risks →

Poisoning Attacks

"Poisoning attacks [143] could influence the behavior of the model by making small changes to the training data. A number of efforts could even leverage data poisoning techniques to implant hidden triggers into models...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.75

Jailbreak in LLM Malicious Use - Poisoning Training Data

"In the data collecting and pre-training phase, malicious adversaries can Jailbreak LLMs through poisoning their training data to make the model to output harmful content."

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.73

Fine-tuning related (Poisoning models during instruction tuning)

"AI models can be poisoned during instruction tuning when models are tuned using pairs of instructions and desired outputs. Poisoning in instruction tuning can be achieved with a lower number of compromised samples, a...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.73

Security - Robustness

While AI safety focuses on threats emanating from generative AI systems, security centers on threats posed to these systems. The most extensively discussed issue in this context are jailbreaking risks, which involve t...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.73

Showing 4 of 10