PromptRiskDBThreat intelligence atlas
AI Risk

Scraping to train data

"When companies scrape personal information and use it to create generative AI tools, they undermine consumers’ control of their personal information by using the information for a purpose for which the consumer did not consent. The individual may not have even imagined their data could be used in the way the company intends when the person posted it online. Individual storing or hosting of scraped personal data m...

AI Risk2. Privacy & Security2.1 > Compromise of privacy by leaking or correctly inferring sensitive information1 - Pre-deployment

Record summary

A quick snapshot of what this page covers.

Techniques2Attack methods connected to this risk.
Mitigations8Defenses that may help with related attacks.
Domain2. Privacy & SecurityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

"When companies scrape personal information and use it to create generative AI tools, they undermine consumers’ control of their personal information by using the information for a purpose for which the consumer did not consent. The individual may not have even imagined their data could be used in the way the company intends when the person posted it online. Individual storing or hosting of scraped personal data may not always be harmful in a vacuum, but there are many risks. Multiple data sets can be combined in ways that cause harm: information that is not sensitive when spread across different databases can be extremely revealing when collected in a single place, and it can be used to make inferences about a person or population. And because scraping makes a copy of someone’s data as it existed at a specific time, the company also takes away the individual’s ability to alter or remove the information from the public sphere. "

Domain2. Privacy & Security
Subdomain2.1 > Compromise of privacy by leaking or correctly inferring sensitive information
Entity1 - Human
Intent1 - Intentional
Timing1 - Pre-deployment
CategoryOpaque Data Collection
SubcategoryScraping to train data

Suggested mitigations

Defenses that may help with related attacks.

AI Telemetry Logging

DeploymentMonitoring and Maintenance
LifecycleDeployment + 1 moreCategoryTechnical - Cyber

Source

Research source for this risk, when available.