Overview
Risk patterns
Patterns found in the case record and its linked vulnerabilities.
- 1Dominant ATLAS tactic. AI Model Access appears in 1 case steps.
- 2Multiple attack methods. The case connects to 4 unique AI attack methods.
Procedure timeline
Search the case steps or filter them by attacker goal.
-
AI Model Access Adversaries were able to interact with Tay via Twitter messages.
-
Initial Access
Step 2
Data
Tay bot used the interactions with its Twitter users as training data to improve its conversations. Adversaries were able to coordinate with the intent of defacing Tay bot by exploiting this feedback loop.
-
Persistence
Step 3
Poison Training Data
By repeatedly interacting with Tay using racist and offensive language, they were able to skew Tay's dataset towards that language as well. This was done by adversaries using the "repeat after me" function, a command that forced Tay to repeat anything said to it.
-
Impact
Step 4
Erode AI Model Integrity
As a result of this coordinated attack, Tay's conversation algorithms began to learn to generate reprehensible material. Tay's internalization of this detestable language caused it to be unpromptedly repeated during interactions with innocent users.
Mitigations
Defenses connected to the attack methods in this case.
Sources
Original public records and references for this case.
Original source
Original source links
Open the MITRE ATLAS data and public references used for this case study.