Overview
Risk patterns
Patterns found in the case record and its linked vulnerabilities.
- 1Dominant ATLAS tactic. Resource Development appears in 3 case steps.
- 2Multiple attack methods. The case connects to 5 unique AI attack methods.
Procedure timeline
Search the case steps or filter them by attacker goal.
-
Reconnaissance Using the public documentation about GPT-2, the researchers gathered information about the dataset, model architecture, and training hyper-parameters.
-
Resource Development
Step 2
Models
The researchers obtained a reference implementation of a similar publicly available model called Grover.
-
Resource Development
Step 3
Datasets
The researchers were able to manually recreate the dataset used in the original GPT-2 paper using the gathered documentation.
-
Resource Development The researchers were able to use TensorFlow Research Cloud via their academic credentials.
-
AI Attack Staging The researchers modified Grover's objective function to reflect GPT-2's objective function and then trained on the dataset they curated using used Grover's initial hyperparameters. The resulting model functionally replicates GPT-2, obtaining similar performance on most datasets. A bad actor who followed the same procedure as the researchers could then use the replicated GPT-2 model for malicious purposes.
Mitigations
Defenses connected to the attack methods in this case.
Sources
Original public records and references for this case.
Original source
Original source links
Open the MITRE ATLAS data and public references used for this case study.