Record summary
A quick snapshot of what this page covers.
Risk profile
How this risk is described and categorized.
"First, because LLMs display immense modelling power, there is a risk that the model weights encode private information present in the training corpus. In particular, it is possible for LLMs to ‘memorise’ personally identifiable information (PII) such as names, addresses and telephone numbers, and subsequently leak such information through generated text outputs (Carlini et al., 2021). Private information leakage could occur accidentally or as the result of an attack in which a person employs adversarial prompting to extract private information from the model. In the context of pre-training data extracted from online public sources, the issue of LLMs potentially leaking training data underscores the challenge of the ‘privacy in public’ paradox for the ‘right to be let alone’ paradigm and highlights the relevance of the contextual integrity paradigm for LLMs. Training data leakage can also affect information collected for the purpose of model refinement (e.g. via fine-tuning on user feedback) at later stages in the development cycle. Note, however, that the extraction of publicly available data from LLMs does not render the data more sensitive per se, but rather the risks associated with such extraction attacks needs to be assessed in light of the intentions and culpability of the user extracting the data."
Suggested mitigations
Defenses that may help with related attacks.
Restrict Number of AI Model Queries
Control Access to AI Models and Data in Production
AI Telemetry Logging
Control Access to AI Models and Data at Rest
Sanitize Training Data
Verify AI Artifacts
Maintain AI Dataset Provenance
Source
Research source for this risk, when available.
Included resource
The Ethics of Advanced AI Assistants
Original source
MIT AI Risk Repository
Open the public repository used for AI risk records and taxonomy fields.
