Modify AI Agent Configuration - AI Security Technique

Overview

A source-backed snapshot of this AI security technique.

Adversaries may modify the configuration files for AI agents on a system. This allows malicious changes to persist beyond the life of a single agent and affects any agents that share the configuration.

Configuration changes may include modifications to the system prompt, tampering with or replacing knowledge sources, modification to settings of connected tools, and more. Through those changes, an attacker could redirect outputs or tools to malicious services, embed covert instructions that exfiltrate data, or weaken security controls that normally restrict agent behavior.

Adversaries may modify or disable a configuration setting related to security controls, such as those that would prevent the AI Agent from taking actions that may be harmful to the user's system without human-in-the-loop oversight. Disabling AI agent security features may allow adversaries to achieve their malicious goals and maintain long-term corruption of the AI agent.

Tactics2Attacker goals connected to this method.

Mitigations0Defenses that may help against this attack.

AI risks5Research-backed risks connected to this topic.

Technique details

Identifiers, maturity, and source taxonomy for this technique.

ATLAS ID: AML.T0081
Maturity: demonstrated
Priority score: 75

ATLAS tactics

Defense EvasionPersistence

Attack flow

How to read the public records connected to this technique.

1. TechniqueRead the ATLAS description and evidence level.

2. TacticsSee which attacker goals this method supports.

3. ExamplesCheck whether public case studies mention it.

4. DefensesReview safeguards mapped by ATLAS.

5. SourcesOpen the original public records and references.

Impact

Why this technique may deserve attention in the current dataset.

Evidence leveldemonstrated
Mapped defenses0 ATLAS mitigation records
Public examples3 linked case study records
Research risks5 related MIT AI Risk records above the confidence threshold
Vulnerabilities0 linked CVE records

Mitigations

Defenses that may help against this attack.

View all mitigations →

No connected defenses. No defense is connected to this attack in the current data.

Case studies

Examples from public reports and exercises.

3 recordsView all case studies →

OpenClaw Command & Control via Prompt Injection

Researchers at HiddenLayer demonstrated how a webpage can embed an indirect prompt injection that causes OpenClaw to silently execute a malicious script. Once executed, the script plants persistent malicious instructions into future system prompts, allowing the attacker to issue new commands, turning OpenClaw into a command and control agent.

What makes this attack unique is that, through a simple indirect prompt injection attack into an agentic lifecycle, untrusted content can be used to spoof the model’s control scheme and induce unapproved tool invocation for execution. Through this single inject, an LLM can become a persistent, automated command & control implant.

Date2026-02-03

exercise

OpenClaw 1-Click Remote Code Execution

A security researcher demonstrated a 1-click remote code execution (RCE) vulnerability to the OpenClaw AI Agent via a malicious link containing a JavaScript script that only takes milliseconds to execute. This vulnerability has been reported and is being tracked to versions of OpenClaw as CVE-2026-25253. ^[1] OpenClaw “is a personal AI assistant you run on your own devices. It answers you on the chat apps you already use. Unlike SaaS assistants where your data lives on someone else’s servers, OpenClaw runs where you choose – laptop, homelab, or VPS. Your infrastructure. Your keys. Your data.” ^[2]

The researcher demonstrated that when the victim clicks a malicious link, a client-side JavaScript script is executed on the victim’s browser that can steal authentication tokens from the OpenClaw control interface via a WebSocket connection. It then uses Cross-Site WebSocket Hijacking to bypass localhost restrictions to the OpenClaw Gateway API. Once the connection was established, it uses the stolen token to authenticate and modify the OpenClaw agent configuration to disable user confirmation and escape the container, allowing shell commands to be run directly on the host machine.

References

Date2026-02-01

exercise

Rules File Backdoor: Supply Chain Attack on AI Coding Assistants

Pillar Security researchers demonstrated how adversaries can compromise AI-generated code by injecting malicious instructions into rules files used to configure AI coding assistants like Cursor and GitHub Copilot. The attack uses invisible Unicode characters to hide malicious prompts that manipulate the AI to insert backdoors, vulnerabilities, or malicious scripts into generated code. These poisoned rules files are distributed through open-source repositories and developer communities, creating a scalable supply chain attack that could affect millions of developers and end users through compromised software.

Vendor Response to Responsible Disclosure:

Cursor: Determined that this risk falls under the users’ responsibility.
GitHub Copilot: Implemented a new security feature that displays a warning when a file's contents include hidden Unicode text on github.com.

Date2025-03-18

exercise

Related risks

Research-backed risks connected to this topic.

5 recordsView all risks →

Attacking LLMs via Additional Modalities a

"LLMs can now process modalities other than text, e.g. images or video frames (OpenAI, 2023c; Gemini Team, 2023). Several studies show that gradient-based attacks on multimodal models are easy and effective (Carlini e...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.67

Adversarial AI: Data and Model Exfiltration Attacks

"Other forms of abuse can include privacy attacks that allow adversaries to exfiltrate or gain knowledge of the private training data set or other valuable assets. For example, privacy attacks such as membership infer...

Domain2. Privacy & SecuritySubdomain2.1 > Compromise of privacy by leaking or correctly inferring sensitive information

Confidence0.67

Jailbreak of a multimodal model

"Current generation multimodal (e.g., vision and language) GPAI models are vulnerable to adversarial jailbreak attacks. These attacks can be used to automatically induce a model to produce an arbitrary or specific out...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.67

Model extraction

"Data Exfiltration goes beyond revealing private information, and involves illicitly obtaining the training data used to build a model that may be sensitive or proprietary. Model Extraction is the same attack, only di...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.67

Showing 4 of 5