Record summary
A quick snapshot of what this page covers.
Risk profile
How this risk is described and categorized.
"Adversarial tool providers can embed malicious instructions in the APIs or prompts [84], leading LLMs to leak memorized sensitive information in the training data or users’ prompts (CVE2023-32786). As a result, LLMs lack control over the output, resulting in sensitive information being disclosed to external tool providers. Besides, attackers can easily manipulate public data to launch targeted attacks, generating specific malicious outputs according to user inputs. Furthermore, feeding the information from external tools into LLMs may lead to injection attacks [61]. For example, unverified inputs may result in arbitrary code execution (CVE-2023-29374)."
Suggested mitigations
Defenses that may help with related attacks.
Memory Hardening
Restrict Library Loading
Verify AI Artifacts
Vulnerability Scanning
User Training
AI Bill of Materials
AI Telemetry Logging
Privileged AI Agent Permissions Configuration
Single-User AI Agent Permissions Configuration
AI Agent Tools Permissions Configuration
Human In-the-Loop for AI Agent Actions
Restrict AI Agent Tool Invocation on Untrusted Data
Segmentation of AI Agent Components
Input and Output Validation for AI Agent Components
Code Signing
Generative AI Guardrails
Generative AI Guidelines
Generative AI Model Alignment
Control Access to AI Models and Data in Production
Use Ensemble Methods
Control Access to AI Models and Data at Rest
Validate AI Model
Sanitize Training Data
Maintain AI Dataset Provenance
Source
Research source for this risk, when available.
Included resource
Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems
Original source
MIT AI Risk Repository
Open the public repository used for AI risk records and taxonomy fields.