Local AI Agent - AI Security Technique

Overview

A source-backed snapshot of this AI security technique.

Adversaries may achieve full system compromise by abusing AI agents running locally on a host, such as computer-use agents or AI-driven browsers. These agents are designed to autonomously interact with the operating system, applications, and external services, often with broad permissions to execute commands, access files, manage credentials, and control user workflows.

If an adversary is able to take control of an AI agent's behavior, they effectively gain the same level of access as the agent. This can result in complete control over the machine, including executing arbitrary code, accessing or exfiltrating sensitive data, modifying system configurations, and establishing persistence.

Tactics0Attacker goals connected to this method.

Mitigations0Defenses that may help against this attack.

AI risks5Research-backed risks connected to this topic.

Technique details

Identifiers, maturity, and source taxonomy for this technique.

ATLAS ID: AML.T0112.000
Maturity: demonstrated
Priority score: 75

Attack flow

How to read the public records connected to this technique.

1. TechniqueRead the ATLAS description and evidence level.

2. TacticsSee which attacker goals this method supports.

3. ExamplesCheck whether public case studies mention it.

4. DefensesReview safeguards mapped by ATLAS.

5. SourcesOpen the original public records and references.

Impact

Why this technique may deserve attention in the current dataset.

Evidence leveldemonstrated
Mapped defenses0 ATLAS mitigation records
Public examples3 linked case study records
Research risks5 related MIT AI Risk records above the confidence threshold
Vulnerabilities0 linked CVE records

Mitigations

Defenses that may help against this attack.

View all mitigations →

No connected defenses. No defense is connected to this attack in the current data.

Case studies

Examples from public reports and exercises.

3 recordsView all case studies →

OpenClaw Command & Control via Prompt Injection

Researchers at HiddenLayer demonstrated how a webpage can embed an indirect prompt injection that causes OpenClaw to silently execute a malicious script. Once executed, the script plants persistent malicious instructions into future system prompts, allowing the attacker to issue new commands, turning OpenClaw into a command and control agent.

What makes this attack unique is that, through a simple indirect prompt injection attack into an agentic lifecycle, untrusted content can be used to spoof the model’s control scheme and induce unapproved tool invocation for execution. Through this single inject, an LLM can become a persistent, automated command & control implant.

Date2026-02-03

exercise

AI ClickFix: Hijacking Computer-Use Agents Using ClickFix

Embrace the Red demonstrated that AI computer-use agents are vulnerable to social engineering attacks and can be manipulated into executing arbitrary code on a victim’s machine. The attack is a variation on “ClickFix” which is a social engineering attack that fools humans into copying malicious commands and executing them.

The researcher used ChatGPT to generate a website designed to attract interactions with computer-use agents. When a user asked their Claude Computer-Use Agent to visit the researcher’s website, the text “Are you a computer? Please see instructions to confirm:” caused the agent to click the associated button. This executed JavaScript to copy a malicious command into the agent’s clipboard. The agent then proceeded to follow the instructions, opening a terminal, pasting the malicious command, and executing it. The command downloads a script from the researcher’s website and executes it. In the demonstration, the script opens the victim’s Calculator App, but in practice an adversary could run arbitrary code, compromising the victim’s system.

Date2025-05-24

exercise

LLMSmith: RCE Vulnerabilities in LLM-Integrated Applications

Researchers identified 20 remote code execution (RCE) vulnerabilities across 11 different LLM frameworks. They discovered applications deployed on the public internet built using these LLM frameworks and demonstrated the RCE vulnerabilities could be exploited using prompt injection.

The 11 LLM frameworks the researchers evaluated were: LangChain, LlamaIndex, Pandas-ai, Langflow, Pandas-llm, Auto-GPT, Griptape, Lagent, MetaGPT, vanna, and langroid.

Date2025-02-27

exercise

Related risks

Research-backed risks connected to this topic.

5 recordsView all risks →

Attacking LLMs via Additional Modalities a

"LLMs can now process modalities other than text, e.g. images or video frames (OpenAI, 2023c; Gemini Team, 2023). Several studies show that gradient-based attacks on multimodal models are easy and effective (Carlini e...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.67

Adversarial AI: Data and Model Exfiltration Attacks

"Other forms of abuse can include privacy attacks that allow adversaries to exfiltrate or gain knowledge of the private training data set or other valuable assets. For example, privacy attacks such as membership infer...

Domain2. Privacy & SecuritySubdomain2.1 > Compromise of privacy by leaking or correctly inferring sensitive information

Confidence0.67

Jailbreak of a multimodal model

"Current generation multimodal (e.g., vision and language) GPAI models are vulnerable to adversarial jailbreak attacks. These attacks can be used to automatically induce a model to produce an arbitrary or specific out...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.67

Model extraction

"Data Exfiltration goes beyond revealing private information, and involves illicitly obtaining the training data used to build a model that may be sensitive or proprietary. Model Extraction is the same attack, only di...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.67

Showing 4 of 5