APromptRiskDBThreat intelligence atlas
AI Case Study

GPT-2 Model Replication - AI Case Study

OpenAI built GPT-2, a language model capable of generating high quality text samples. Over concerns that GPT-2 could be used for malicious purposes such as impersonating others, or generating misleading news articles, fake social media content, or spam, OpenAI adopted a tiered release schedule. They initially released a smaller, less powerful version of GPT-2 along with a technical description of the approach, but...

ExerciseOpenAI GPT-2Researchers at Brown UniversityResource DevelopmentReconnaissanceAI Attack Staging

Overview

Case steps5Steps described in the case record.
Techniques5Attack methods mentioned in the case steps.
Linked CVEs0Known vulnerabilities mentioned in the record.

Risk patterns

Patterns found in the case record and its linked vulnerabilities.

  • 1Dominant ATLAS tactic. Resource Development appears in 3 case steps.
  • 2Multiple attack methods. The case connects to 5 unique AI attack methods.

Procedure timeline

Search the case steps or filter them by attacker goal.

Resource Development3Reconnaissance1AI Attack Staging1
  1. Step 2

    Models

    Resource Development

    The researchers obtained a reference implementation of a similar publicly available model called Grover.

  2. Step 3

    Datasets

    Resource Development

    The researchers were able to manually recreate the dataset used in the original GPT-2 paper using the gathered documentation.

  3. AI Attack Staging

    The researchers modified Grover's objective function to reflect GPT-2's objective function and then trained on the dataset they curated using used Grover's initial hyperparameters. The resulting model functionally replicates GPT-2, obtaining similar performance on most datasets. A bad actor who followed the same procedure as the researchers could then use the replicated GPT-2 model for malicious purposes.

Mitigations

Defenses connected to the attack methods in this case.

Sources

Original public records and references for this case.

Original source

Original source links

Open the MITRE ATLAS data and public references used for this case study.