APromptRiskDBThreat intelligence atlas
AI Security Technique

Modify AI Agent Configuration - AI Security Technique

Adversaries may modify the configuration files for AI agents on a system. This allows malicious changes to persist beyond the life of a single agent and affects any agents that share the configuration. Configuration changes may include modifications to the system prompt, tampering with or replacing knowledge sources, modification to settings of connected tools, and more. Through those changes, an attacker could re...

AI Security TechniquedemonstratedDefense EvasionPersistence

Record summary

A quick snapshot of what this page covers.

Tactics2Attacker goals connected to this method.
Mitigations0Defenses that may help against this attack.
AI risks5Research-backed risks connected to this topic.

Attack context

How this AI attack works in practice.

Adversaries may modify the configuration files for AI agents on a system. This allows malicious changes to persist beyond the life of a single agent and affects any agents that share the configuration.

Configuration changes may include modifications to the system prompt, tampering with or replacing knowledge sources, modification to settings of connected tools, and more. Through those changes, an attacker could redirect outputs or tools to malicious services, embed covert instructions that exfiltrate data, or weaken security controls that normally restrict agent behavior.

Adversaries may modify or disable a configuration setting related to security controls, such as those that would prevent the AI Agent from taking actions that may be harmful to the user's system without human-in-the-loop oversight. Disabling AI agent security features may allow adversaries to achieve their malicious goals and maintain long-term corruption of the AI agent.

ATLAS ID
AML.T0081
Priority score
75
Maturity: demonstrated
Defense EvasionPersistence

Mitigations

Defenses that may help against this attack.

No connected defenses. No defense is connected to this attack in the current data.

Case studies

Examples from public reports and exercises.

OpenClaw Command & Control via Prompt Injection

exercise
Date2026-02-03

Researchers at HiddenLayer demonstrated how a webpage can embed an indirect prompt injection that causes OpenClaw to silently execute a malicious script. Once executed, the script plants persistent malicious instructions into future system prompts, allowing the attacker to issue new commands, turning OpenClaw into a command and control agent.

What makes this attack unique is that, through a simple indirect prompt injection attack into an agentic lifecycle, untrusted content can be used to spoof the model’s control scheme and induce unapproved tool invocation for execution. Through this single inject, an LLM can become a persistent, automated command & control implant.

OpenClaw 1-Click Remote Code Execution

exercise
Date2026-02-01

A security researcher demonstrated a 1-click remote code execution (RCE) vulnerability to the OpenClaw AI Agent via a malicious link containing a JavaScript script that only takes milliseconds to execute. This vulnerability has been reported and is being tracked to versions of OpenClaw as CVE-2026-25253. [<sup>\[1\]</sup>][1] OpenClaw “is a personal AI assistant you run on your own devices. It answers you on the chat apps you already use. Unlike SaaS assistants where your data lives on someone else’s servers, OpenClaw runs where you choose – laptop, homelab, or VPS. Your infrastructure. Your keys. Your data.” [<sup>\[2\]</sup>][2]

The researcher demonstrated that when the victim clicks a malicious link, a client-side JavaScript script is executed on the victim’s browser that can steal authentication tokens from the OpenClaw control interface via a WebSocket connection. It then uses Cross-Site WebSocket Hijacking to bypass localhost restrictions to the OpenClaw Gateway API. Once the connection was established, it uses the stolen token to authenticate and modify the OpenClaw agent configuration to disable user confirmation and escape the container, allowing shell commands to be run directly on the host machine.

References

  1. [1] https://nvd.nist.gov/vuln/detail/CVE-2026-25253
  2. [2] https://openclaw.ai/blog/introducing-openclaw

Rules File Backdoor: Supply Chain Attack on AI Coding Assistants

exercise
Date2025-03-18

Pillar Security researchers demonstrated how adversaries can compromise AI-generated code by injecting malicious instructions into rules files used to configure AI coding assistants like Cursor and GitHub Copilot. The attack uses invisible Unicode characters to hide malicious prompts that manipulate the AI to insert backdoors, vulnerabilities, or malicious scripts into generated code. These poisoned rules files are distributed through open-source repositories and developer communities, creating a scalable supply chain attack that could affect millions of developers and end users through compromised software.

Vendor Response to Responsible Disclosure:

  • Cursor: Determined that this risk falls under the users’ responsibility.
  • GitHub Copilot: Implemented a new security feature that displays a warning when a file's contents include hidden Unicode text on github.com.

Source

Where this page information comes from.