Privacy and data collection concerns (collecting personal information or personally identifiable information)

Record summary

A quick snapshot of what this page covers.

Techniques23Attack methods connected to this risk.

Mitigations20Defenses that may help with related attacks.

Domain2. Privacy & SecurityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

"Generative AI developers train their models with extensive datasets often gathered through online web scraping of websites that may include personal data or personally identifiable information (PII). For most generative AI applications, such as initial model training, the primary concerns are the quantity, variety, and quality of the data, not whether they include personally identifiable information. However, some web-scraped datasets may inadvertently include personal data. Additionally, when downstream developers integrate generative AI into their products or services by fine- tuning a pre-trained model, they often use their own in-house data, which may include personal information."

Domain2. Privacy & Security

Subdomain2.1 > Compromise of privacy by leaking or correctly inferring sensitive information

Entity1 - Human

Intent2 - Unintentional

Timing1 - Pre-deployment

CategoryLegal challenges

SubcategoryPrivacy and data collection concerns (collecting personal information or personally identifiable information)

Related techniques

Attack methods connected to this risk.

Suggested mitigations

Defenses that may help with related attacks.

Source

Research source for this risk, when available.

Included resource

Regulating under Uncertainty: Governance Options for Generative AI

AuthorsG'sellYear2024TypeReport

DOI10.2139/ssrn.4918704 URLhttps://papers.ssrn.com/sol3/papers.cfm?abstract_id=4918704

Original source

MIT AI Risk Repository

Open the public repository used for AI risk records and taxonomy fields.

Repositoryhttps://airisk.mit.edu/

Privacy and data collection concerns (collecting personal information or personally identifiable information)

Record summary

Risk profile

Suggested mitigations

Passive AI Output Obfuscation

Restrict Number of AI Model Queries

AI Telemetry Logging

Limit Model Artifact Release

Control Access to AI Models and Data at Rest

Encrypt Sensitive Information

AI Model Distribution Methods

Sanitize Training Data

Validate AI Model

AI Bill of Materials

Maintain AI Dataset Provenance

Verify AI Artifacts

Use Ensemble Methods

Code Signing

Generative AI Guardrails

Restrict Library Loading

Vulnerability Scanning

User Training

Limit Public Release of Information

Control Access to AI Models and Data in Production

Source

Regulating under Uncertainty: Governance Options for Generative AI

MIT AI Risk Repository