APromptRiskDBThreat intelligence atlas
AI Security Technique

AI Intellectual Property Theft - AI Security Technique

Adversaries may exfiltrate AI artifacts to steal intellectual property and cause economic harm to the victim organization. Proprietary training data is costly to collect and annotate and may be a target for Exfiltration and theft. AIaaS providers charge for use of their API. An adversary who has stolen a model via Exfiltration or via [Extract AI Model](/techniques/AML...

AI Security Techniquerealized

Record summary

A quick snapshot of what this page covers.

Tactics0Attacker goals connected to this method.
Mitigations3Defenses that may help against this attack.
AI risks5Research-backed risks connected to this topic.

Attack context

How this AI attack works in practice.

Adversaries may exfiltrate AI artifacts to steal intellectual property and cause economic harm to the victim organization.

Proprietary training data is costly to collect and annotate and may be a target for Exfiltration and theft.

AIaaS providers charge for use of their API. An adversary who has stolen a model via Exfiltration or via Extract AI Model now has unlimited use of that service without paying the owner of the intellectual property.

ATLAS ID
AML.T0048.004
Priority score
114
Maturity: realized

Mitigations

Defenses that may help against this attack.

Case studies

Examples from public reports and exercises.

Model Distillation Campaigns Targeting Anthropic Claude

incident
Date2026-02-23

Anthropic uncovered campaigns to extract Claude’s capabilities carried out by the three Chinese AI Labs: DeepSeek, Moonshot, and MiniMax. Collectively, these campaigns used approximately 24,000 accounts and 16 million queries. They used model distillation to train their own models on the outputs of Claude in an attempt to replicate Claude’s capabilities such as agentic reasoning, code generation, tool use, and computer use.

As outlined in Anthropic's report, model distillation was leveraged as a means for these labs to undermine Anthropic's export controls.[<sup>\[1\]</sup>][1] Distilled models lack the safeguards that prevent bad actors from using frontier models for malicious purposes such as the bioweapon development, disinformation, offensive cyber operations, and mass surveillance.

References

  1. [1] https://www.anthropic.com/news/detecting-and-preventing-distillation-attacks

AI Model Tampering via Supply Chain Attack

exercise
Date2023-09-26

Researchers at Trend Micro, Inc. used service indexing portals and web searching tools to identify over 8,000 misconfigured private container registries exposed on the internet. Approximately 70% of the registries also had overly permissive access controls that allowed write access. In their analysis, the researchers found over 1,000 unique AI models embedded in private container images within these open registries that could be pulled without authentication.

This exposure could allow adversaries to download, inspect, and modify container contents, including sensitive AI model files. This is an exposure of valuable intellectual property which could be stolen by an adversary. Compromised images could also be pushed to the registry, leading to a supply chain attack, allowing malicious actors to compromise the integrity of AI models used in production systems.

Organization Confusion on Hugging Face

exercise
Date2023-08-23

threlfall_hax, a security researcher, created organization accounts on Hugging Face, a public model repository, that impersonated real organizations. These false Hugging Face organization accounts looked legitimate so individuals from the impersonated organizations requested to join, believing the accounts to be an official site for employees to share models. This gave the researcher full access to any AI models uploaded by the employees, including the ability to replace models with malicious versions. The researcher demonstrated that they could embed malware into an AI model that provided them access to the victim organization's environment. From there, threat actors could execute a range of damaging attacks such as intellectual property theft or poisoning other AI models within the victim's environment.

Arbitrary Code Execution with Google Colab

exercise
Date2022-07-01

Google Colab is a Jupyter Notebook service that executes on virtual machines. Jupyter Notebooks are often used for ML and data science research and experimentation, containing executable snippets of Python code and common Unix command-line functionality. In addition to data manipulation and visualization, this code execution functionality can allow users to download arbitrary files from the internet, manipulate files on the virtual machine, and so on.

Users can also share Jupyter Notebooks with other users via links. In the case of notebooks with malicious code, users may unknowingly execute the offending code, which may be obfuscated or hidden in a downloaded script, for example.

When a user opens a shared Jupyter Notebook in Colab, they are asked whether they'd like to allow the notebook to access their Google Drive. While there can be legitimate reasons for allowing Google Drive access, such as to allow a user to substitute their own files, there can also be malicious effects such as data exfiltration or opening a server to the victim's Google Drive.

This exercise raises awareness of the effects of arbitrary code execution and Colab's Google Drive integration. Practice secure evaluations of shared Colab notebook links and examine code prior to execution.

Attack on Machine Translation Services

exercise
Date2020-04-30

Machine translation services (such as Google Translate, Bing Translator, and Systran Translate) provide public-facing UIs and APIs. A research group at UC Berkeley utilized these public endpoints to create a replicated model with near-production state-of-the-art translation quality. Beyond demonstrating that IP can be functionally stolen from a black-box system, they used the replicated model to successfully transfer adversarial examples to the real production services. These adversarial inputs successfully cause targeted word flips, vulgar outputs, and dropped sentences on Google Translate and Systran Translate websites.

Source

Where this page information comes from.