APromptRiskDBThreat intelligence atlas
AI Security Technique

LLM Trusted Output Components Manipulation - AI Security Technique

Adversaries may utilize prompts to a large language model (LLM) which manipulate various components of its response in order to make it appear trustworthy to the user. This helps the adversary continue to operate in the victim's environment and evade detection by the users it interacts with. The LLM may be instructed to tailor its language to appear more trustworthy to the user or attempt to manipulate the user to...

AI Security TechniquedemonstratedDefense Evasion

Record summary

A quick snapshot of what this page covers.

Tactics1Attacker goals connected to this method.
Mitigations0Defenses that may help against this attack.
AI risks0Research-backed risks connected to this topic.

Attack context

How this AI attack works in practice.

Adversaries may utilize prompts to a large language model (LLM) which manipulate various components of its response in order to make it appear trustworthy to the user. This helps the adversary continue to operate in the victim's environment and evade detection by the users it interacts with.

The LLM may be instructed to tailor its language to appear more trustworthy to the user or attempt to manipulate the user to take certain actions. Other response components that could be manipulated include links, recommended follow-up actions, retrieved document metadata, and Citations.

ATLAS ID
AML.T0067
Priority score
30
Maturity: demonstrated
Defense Evasion

Mitigations

Defenses that may help against this attack.

No connected defenses. No defense is connected to this attack in the current data.

Case studies

Examples from public reports and exercises.

Rules File Backdoor: Supply Chain Attack on AI Coding Assistants

exercise
Date2025-03-18

Pillar Security researchers demonstrated how adversaries can compromise AI-generated code by injecting malicious instructions into rules files used to configure AI coding assistants like Cursor and GitHub Copilot. The attack uses invisible Unicode characters to hide malicious prompts that manipulate the AI to insert backdoors, vulnerabilities, or malicious scripts into generated code. These poisoned rules files are distributed through open-source repositories and developer communities, creating a scalable supply chain attack that could affect millions of developers and end users through compromised software.

Vendor Response to Responsible Disclosure:

  • Cursor: Determined that this risk falls under the users’ responsibility.
  • GitHub Copilot: Implemented a new security feature that displays a warning when a file's contents include hidden Unicode text on github.com.

Source

Where this page information comes from.