Risks from leaking or correctly inferring sensitive information

Record summary

A quick snapshot of what this page covers.

Techniques19Attack methods connected to this risk.

Mitigations18Defenses that may help with related attacks.

Domain2. Privacy & SecurityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

"LMs may provide true, sensitive information that is present in the training data. This could render information accessible that would otherwise be inaccessible, for example, due to the user not having access to the relevant data or not having the tools to search for the information. Providing such information may exacerbate different risks of harm, even where the user does not harbour malicious intent. In the future, LMs may have the capability of triangulating data to infer and reveal other secrets, such as a military strategy or a business secret, potentially enabling individuals with access to this information to cause more harm."

Domain2. Privacy & Security

Subdomain2.1 > Compromise of privacy by leaking or correctly inferring sensitive information

Entity3 - Other

Intent3 - Other

Timing2 - Post-deployment

CategoryInformation Hazards

SubcategoryRisks from leaking or correctly inferring sensitive information

Related techniques

Attack methods connected to this risk.

Suggested mitigations

Defenses that may help with related attacks.

Source

Research source for this risk, when available.

Included resource

Ethical and social risks of harm from language models

AuthorsWeidinger et al.Year2021TypePreprint

DOI10.48550/arXiv.2112.04359 URLhttps://arxiv.org/abs/2112.04359

Original source

MIT AI Risk Repository

Open the public repository used for AI risk records and taxonomy fields.

Repositoryhttps://airisk.mit.edu/

Risks from leaking or correctly inferring sensitive information

Record summary

Risk profile

Suggested mitigations

AI Telemetry Logging

Privileged AI Agent Permissions Configuration

Single-User AI Agent Permissions Configuration

AI Agent Tools Permissions Configuration

Segmentation of AI Agent Components

Restrict Library Loading

Code Signing

Vulnerability Scanning

User Training

AI Bill of Materials

Generative AI Guardrails

Generative AI Guidelines

Generative AI Model Alignment

Human In-the-Loop for AI Agent Actions

Restrict AI Agent Tool Invocation on Untrusted Data

Input and Output Validation for AI Agent Components

Verify AI Artifacts

Control Access to AI Models and Data at Rest

Source

Ethical and social risks of harm from language models

MIT AI Risk Repository