Publish Poisoned AI Agent Tool - AI Security Technique

Overview

A source-backed snapshot of this AI security technique.

Adversaries may create and publish poisoned AI agent tools. Poisoned tools may contain an LLM Prompt Injection, which can lead to a variety of impacts.

Tools may be published to open source version control repositories (e.g. GitHub, GitLab), to package registries (e.g. npm), or to repositories specifically designed for sharing tools (e.g. OpenClaw Hub). These registries may be largely unregulated and may contain many poisoned tools [1]. Tools may also be published as remotely hosted servers [2].

References

Tactics1Attacker goals connected to this method.

Mitigations0Defenses that may help against this attack.

AI risks21Research-backed risks connected to this topic.

Technique details

Identifiers, maturity, and source taxonomy for this technique.

ATLAS ID: AML.T0104
Maturity: realized
Priority score: 165

ATLAS tactics

Resource Development

Attack flow

How to read the public records connected to this technique.

1. TechniqueRead the ATLAS description and evidence level.

2. TacticsSee which attacker goals this method supports.

3. ExamplesCheck whether public case studies mention it.

4. DefensesReview safeguards mapped by ATLAS.

5. SourcesOpen the original public records and references.

Impact

Why this technique may deserve attention in the current dataset.

Evidence levelrealized
Mapped defenses0 ATLAS mitigation records
Public examples3 linked case study records
Research risks21 related MIT AI Risk records above the confidence threshold
Vulnerabilities0 linked CVE records

Mitigations

Defenses that may help against this attack.

View all mitigations →

No connected defenses. No defense is connected to this attack in the current data.

Case studies

Examples from public reports and exercises.

3 recordsView all case studies →

Supply Chain Compromise via Poisoned ClawdBot Skill

A security researcher demonstrated a proof-of-concept supply chain attack using a poisoned ClawdBot Skill shared on ClawdHub, a Skill registry for agents. The poisoned Skill contained a prompt injection that caused ClawdBot to execute a shell command that reached the researcher's server. Although the researcher here used this access simply to warn users about the danger, they could have instead delivered a malicious payload and compromised the user's system. The security researcher recorded 16 different users who downloaded and executed the poisoned Skill in the first 8 hours of it being published on ClawdHub.

Date2026-01-26

exercise

Poisoned Postmark MCP Server Email Exfiltration

A bad actor successfully exfiltrated emails from users of the Postmark’s MCP server via a supply chain attack. Postmark is an email delivery service that allows organizations to send marketing and transactional emails via API. The Postmark MCP server allows users to interact with Postmark via AI agents.

The bad actor impersonated Postmark, by registering the postmark-mcp package name on npm. They initially published the legitimate versions of the MCP server. After the package became popular and reached over 1,000 downloads per week, the bad actor performed a rugpull and uploaded a malicious version of the package. The malicious version added the bad actor’s email address in the BCC line of all emails sent by the MCP tool. Users who upgraded to this version and continued to use the tool would have all emails exfiltrated to the bad actor.

Date2025-09-01

incident

Data Exfiltration via Remote Poisoned MCP Tool

Researchers at Invariant Labs demonstrated that AI agents configured with remote Model Context Protocol (MCP) Tools can be vulnerable to model poisoning attacks. They show that an MCP Tool can contain malicious prompts in its docstring description, which is ingested into the AI agent’s context, modifying its behavior.

They demonstrate this attack with a proof-of-concept MCP Tool that instructs the agent to perform additional actions before using the tool. The agent is instructed to read files containing credentials from the victim’s machine and store their contents in one of the input variables to the tool. When the tool runs, the victim’s credentials are exfiltrated to the poisoned MCP server.

Date2025-04-01

exercise

Related risks

Research-backed risks connected to this topic.

Top 10 of 21View all risks →

Security - Robustness

While AI safety focuses on threats emanating from generative AI systems, security centers on threats posed to these systems. The most extensively discussed issue in this context are jailbreaking risks, which involve t...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.75

Prompt leaking

"Prompt leaking is another type of prompt injection attack designed to expose details contained in private prompts. According to [58], prompt leaking is the act of misleading the model to print the pre-designed instru...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.72

Goal Hijacking

"Goal hijacking is a type of primary attack in prompt injection [58]. By injecting a phrase like “Ignore the above instruction and do ...” in the input, the attack could hijack the original goal of the designed prompt...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.71

Jailbreak in LLM Malicious Use - Prompt Attacks

"In the prompting and reasoning phase, dialog can push LLMs into confused or overly compliant states, raising the risk of producing harmful outputs when confronted with harmful questions. Most of the jailbreak methods...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.70

Showing 4 of 10