AI Agent Tool - AI Security Technique

Overview

A source-backed snapshot of this AI security technique.

Adversaries may target AI agent tools as a means to compromise a victim's AI supply chain. Tools add capabilities to AI agents, allowing them to interact with other services, connect to data sources, access internet resources, run system tools, and execute code. They are an attractive target for adversaries because compromising an AI agent can provide them with broad accesses and permissions on the victim's system via the agent's other tools.

Poisoned agent tools (See AI Agent Tool Poisoning) can contain malicious code or LLM Prompt Injections that manipulate the agent's behavior and even modify how other tools are called. Adversaries have successfully used a poisoned MCP server to exfiltrate private user data [\[5\]][koi].

Agent tools have exploded in popularity, with thousands of MCP servers available publicly [\[2\]][glama]. They are often released on open-source software repositories such as GitHub, indexed on hubs specific to MCP servers [\[3\]][mcp-hub][\[4\]][mcp-server-hub], and published to package registries such as NPM. AI agents can also be connected to remotely-hosted tools [\[5\]][remote-mcp]. This creates an environment where malicious tools can proliferate rapidly and safeguards are often not in place.

[koi]: https://www.koi.ai/blog/postmark-mcp-npm-malicious-backdoor-email-theft "First Malicious MCP in the Wild: The Postmark Backdoor That's Stealing Your Emails" [glama]: https://glama.ai/mcp/servers "Glama" [mcp-hub]: https://www.mcphub.ai/ "MCP Hub" [mcp-server-hub]: https://mcpserverhub.com/ "MCP Server Hub" [remote-mcp]: https://mcpservers.org/remote-mcp-servers "Remote MCP Servers"

Tactics0Attacker goals connected to this method.

Mitigations0Defenses that may help against this attack.

AI risks22Research-backed risks connected to this topic.

Technique details

Identifiers, maturity, and source taxonomy for this technique.

ATLAS ID: AML.T0010.005
Maturity: realized
Priority score: 170

Attack flow

How to read the public records connected to this technique.

1. TechniqueRead the ATLAS description and evidence level.

2. TacticsSee which attacker goals this method supports.

3. ExamplesCheck whether public case studies mention it.

4. DefensesReview safeguards mapped by ATLAS.

5. SourcesOpen the original public records and references.

Impact

Why this technique may deserve attention in the current dataset.

Evidence levelrealized
Mapped defenses0 ATLAS mitigation records
Public examples3 linked case study records
Research risks22 related MIT AI Risk records above the confidence threshold
Vulnerabilities0 linked CVE records

Mitigations

Defenses that may help against this attack.

View all mitigations →

No connected defenses. No defense is connected to this attack in the current data.

Case studies

Examples from public reports and exercises.

3 recordsView all case studies →

Supply Chain Compromise via Poisoned ClawdBot Skill

A security researcher demonstrated a proof-of-concept supply chain attack using a poisoned ClawdBot Skill shared on ClawdHub, a Skill registry for agents. The poisoned Skill contained a prompt injection that caused ClawdBot to execute a shell command that reached the researcher's server. Although the researcher here used this access simply to warn users about the danger, they could have instead delivered a malicious payload and compromised the user's system. The security researcher recorded 16 different users who downloaded and executed the poisoned Skill in the first 8 hours of it being published on ClawdHub.

Date2026-01-26

exercise

Poisoned Postmark MCP Server Email Exfiltration

A bad actor successfully exfiltrated emails from users of the Postmark’s MCP server via a supply chain attack. Postmark is an email delivery service that allows organizations to send marketing and transactional emails via API. The Postmark MCP server allows users to interact with Postmark via AI agents.

The bad actor impersonated Postmark, by registering the postmark-mcp package name on npm. They initially published the legitimate versions of the MCP server. After the package became popular and reached over 1,000 downloads per week, the bad actor performed a rugpull and uploaded a malicious version of the package. The malicious version added the bad actor’s email address in the BCC line of all emails sent by the MCP tool. Users who upgraded to this version and continued to use the tool would have all emails exfiltrated to the bad actor.

Date2025-09-01

incident

Data Exfiltration via Remote Poisoned MCP Tool

Researchers at Invariant Labs demonstrated that AI agents configured with remote Model Context Protocol (MCP) Tools can be vulnerable to model poisoning attacks. They show that an MCP Tool can contain malicious prompts in its docstring description, which is ingested into the AI agent’s context, modifying its behavior.

They demonstrate this attack with a proof-of-concept MCP Tool that instructs the agent to perform additional actions before using the tool. The agent is instructed to read files containing credentials from the victim’s machine and store their contents in one of the input variables to the tool. When the tool runs, the victim’s credentials are exfiltrated to the poisoned MCP server.

Date2025-04-01

exercise

Related risks

Research-backed risks connected to this topic.

Top 10 of 22View all risks →

Fine-tuning related (Fine-tuning dataset poisoning)

"A deployer can poison the dataset used during the fine-tuning process [98] to induce specific, often malicious, behaviors in a model. This can be performed without having access to the model’s weights. This poisoning...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.74

Poisoning

"Data Poisoning involves deliberately corrupting a model’s training dataset to introduce vulnerabilities, derail its learning process, or cause it to make incorrect predictions (Carlini et al., 2023). For example, the...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.74

Vulnerabilities arising from additional modalities in multimodal models

"Additional modalities can introduce new attack vectors in multimodal models as well as expand the scope of the previous attacks, ranging from jailbreaking to poisoning [13]. Typically, different modalities have diffe...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.73

Data poisoning

"Data poisoning describes an attack in the form of an injection of malicious data into the training set. If not prevented, this attack leads the AI system to learn unintended behavior."

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.72

Showing 4 of 10