Record summary
A quick snapshot of what this page covers.
Risk profile
How this risk is described and categorized.
"LMs may provide true, sensitive information that is present in the training data. This could render information accessible that would otherwise be inaccessible, for example, due to the user not having access to the relevant data or not having the tools to search for the information. Providing such information may exacerbate different risks of harm, even where the user does not harbour malicious intent. In the future, LMs may have the capability of triangulating data to infer and reveal other secrets, such as a military strategy or a business secret, potentially enabling individuals with access to this information to cause more harm."
Suggested mitigations
Defenses that may help with related attacks.
AI Telemetry Logging
Privileged AI Agent Permissions Configuration
Single-User AI Agent Permissions Configuration
AI Agent Tools Permissions Configuration
Segmentation of AI Agent Components
Restrict Library Loading
Code Signing
Vulnerability Scanning
User Training
AI Bill of Materials
Generative AI Guardrails
Generative AI Guidelines
Generative AI Model Alignment
Human In-the-Loop for AI Agent Actions
Restrict AI Agent Tool Invocation on Untrusted Data
Input and Output Validation for AI Agent Components
Verify AI Artifacts
Control Access to AI Models and Data at Rest
Source
Research source for this risk, when available.
Included resource
Ethical and social risks of harm from language models
Original source
MIT AI Risk Repository
Open the public repository used for AI risk records and taxonomy fields.
