APromptRiskDBThreat intelligence atlas
AI Security Technique

AI Model Inference API Access - AI Security Technique

Adversaries may gain access to a model via legitimate access to the inference API. Inference API access can be a source of information to the adversary (Discover AI Model Ontology, Discover AI Model Family), a means of staging the attack (Verify Attack, Craft Adversarial Data), or for introducing data to the target...

AI Security TechniquerealizedAI Model Access

Record summary

A quick snapshot of what this page covers.

Tactics1Attacker goals connected to this method.
Mitigations2Defenses that may help against this attack.
AI risks19Research-backed risks connected to this topic.

Attack context

How this AI attack works in practice.

Adversaries may gain access to a model via legitimate access to the inference API. Inference API access can be a source of information to the adversary (Discover AI Model Ontology, Discover AI Model Family), a means of staging the attack (Verify Attack, Craft Adversarial Data), or for introducing data to the target system for Impact (Evade AI Model, Erode AI Model Integrity).

Many systems rely on the same models provided via an inference API, which means they share the same vulnerabilities. This is especially true of foundation models which are prohibitively resource intensive to train. Adversaries may use their access to model APIs to identify vulnerabilities such as jailbreaks or hallucinations and then target applications that use the same models.

ATLAS ID
AML.T0040
Priority score
201
Maturity: realized
AI Model Access

Mitigations

Defenses that may help against this attack.

AML.M0024 - AI Telemetry Logging

DeploymentMonitoring and Maintenance
LifecycleDeployment + 1 moreCategoryTechnical - Cyber

Telemetry logging can help audit API usage of the model.

Case studies

Examples from public reports and exercises.

Model Distillation Campaigns Targeting Anthropic Claude

incident
Date2026-02-23

Anthropic uncovered campaigns to extract Claude’s capabilities carried out by the three Chinese AI Labs: DeepSeek, Moonshot, and MiniMax. Collectively, these campaigns used approximately 24,000 accounts and 16 million queries. They used model distillation to train their own models on the outputs of Claude in an attempt to replicate Claude’s capabilities such as agentic reasoning, code generation, tool use, and computer use.

As outlined in Anthropic's report, model distillation was leveraged as a means for these labs to undermine Anthropic's export controls.[<sup>\[1\]</sup>][1] Distilled models lack the safeguards that prevent bad actors from using frontier models for malicious purposes such as the bioweapon development, disinformation, offensive cyber operations, and mass surveillance.

References

  1. [1] https://www.anthropic.com/news/detecting-and-preventing-distillation-attacks

ChatGPT Package Hallucination

exercise
Date2024-06-01

Researchers identified that large language models such as ChatGPT can hallucinate fake software package names that are not published to a package repository. An attacker could publish a malicious package under the hallucinated name to a package repository. Then users of the same or similar large language models may encounter the same hallucination and ultimately download and execute the malicious package leading to a variety of potential harms.

Morris II Worm: RAG-Based Attack

exercise
Date2024-03-05

Researchers developed Morris II, a zero-click worm designed to attack generative AI (GenAI) ecosystems and propagate between connected GenAI systems. The worm uses an adversarial self-replicating prompt which uses prompt injection to replicate the prompt as output and perform malicious activity. The researchers demonstrate how this worm can propagate through an email system with a RAG-based assistant. They use a target system that automatically ingests received emails, retrieves past correspondences, and generates a reply for the user. To carry out the attack, they send a malicious email containing the adversarial self-replicating prompt, which ends up in the RAG database. The malicious instructions in the prompt tell the assistant to include sensitive user data in the response. Future requests to the email assistant may retrieve the malicious email. This leads to propagation of the worm due to the self-replicating portion of the prompt, as well as leaking private information due to the malicious instructions.

Attack on Machine Translation Services

exercise
Date2020-04-30

Machine translation services (such as Google Translate, Bing Translator, and Systran Translate) provide public-facing UIs and APIs. A research group at UC Berkeley utilized these public endpoints to create a replicated model with near-production state-of-the-art translation quality. Beyond demonstrating that IP can be functionally stolen from a black-box system, they used the replicated model to successfully transfer adversarial examples to the real production services. These adversarial inputs successfully cause targeted word flips, vulgar outputs, and dropped sentences on Google Translate and Systran Translate websites.

Microsoft Edge AI Evasion

exercise
Date2020-02-01

The Azure Red Team performed a red team exercise on a new Microsoft product designed for running AI workloads at the edge. This exercise was meant to use an automated system to continuously manipulate a target image to cause the ML model to produce misclassifications.

Face Identification System Evasion via Physical Countermeasures

exercise
Date2020-01-01

MITRE's AI Red Team demonstrated a physical-domain evasion attack on a commercial face identification service with the intention of inducing a targeted misclassification. This operation had a combination of traditional MITRE ATT&CK techniques such as finding valid accounts and executing code via an API - all interleaved with adversarial ML specific attacks.

Microsoft Azure Service Disruption

exercise
Date2020-01-01

The Microsoft AI Red Team performed a red team exercise on an internal Azure service with the intention of disrupting its service. This operation had a combination of traditional ATT&CK enterprise techniques such as finding valid account, and exfiltrating data -- all interleaved with adversarial ML specific steps such as offline and online evasion examples.

Source

Where this page information comes from.