Record summary
A quick snapshot of what this page covers.
Risk profile
How this risk is described and categorized.
"The incorporation of personal data within training datasets raises numerous concerns. The primary issue is that personal data may be incorporated without the knowledge or consent of the individuals concerned, even though the data may include names, identification numbers, Social Security numbers, or other personal information. Another particularly difficult problem is related to the fact that complex models may “memorize” (i.e., store) specific threads of training data and regurgitate them when responding to a prompt.498 This data memorization can directly lead to leakage of personal data. Even if generative AI models do not memorize or leak personal data, they make it possible to recognize patterns or information structures that could enable malicious users to uncover personal details."
Suggested mitigations
Defenses that may help with related attacks.
Restrict Library Loading
Verify AI Artifacts
Vulnerability Scanning
User Training
AI Bill of Materials
Limit Model Artifact Release
Control Access to AI Models and Data at Rest
Encrypt Sensitive Information
AI Model Distribution Methods
Code Signing
Use Ensemble Methods
Source
Research source for this risk, when available.
Included resource
Regulating under Uncertainty: Governance Options for Generative AI
Original source
MIT AI Risk Repository
Open the public repository used for AI risk records and taxonomy fields.
