Black-Box Transfer - AI Security Technique

AI Security Technique

Overview

A source-backed snapshot of this AI security technique.

In Black-Box Transfer attacks, the adversary uses one or more proxy models (trained via Create Proxy AI Model or Train Proxy via Replication) they have full access to and are representative of the target model. The adversary uses White-Box Optimization on the proxy models to generate adversarial examples. If the set of proxy models are close enough to the target model, the adversarial example should generalize from one to another. This means that an attack that works for the proxy models will likely then work for the target model. If the adversary has AI Model Inference API Access, they may use Verify Attack to confirm the attack is working and incorporate that information into their training process.

Tactics0Attacker goals connected to this method.

Mitigations4Defenses that may help against this attack.

AI risks0Research-backed risks connected to this topic.

Technique details

Identifiers, maturity, and source taxonomy for this technique.

ATLAS ID: AML.T0043.002
Maturity: demonstrated
Priority score: 62

Attack flow

How to read the public records connected to this technique.

1. TechniqueRead the ATLAS description and evidence level.

2. TacticsSee which attacker goals this method supports.

3. ExamplesCheck whether public case studies mention it.

4. DefensesReview safeguards mapped by ATLAS.

5. SourcesOpen the original public records and references.

Impact

Why this technique may deserve attention in the current dataset.

Evidence leveldemonstrated
Mapped defenses4 ATLAS mitigation records
Public examples3 linked case study records
Research risks0 related MIT AI Risk records above the confidence threshold
Vulnerabilities0 linked CVE records

Mitigations

Defenses that may help against this attack.

4 recordsView all mitigations →

AML.M0015 - Adversarial Input Detection

Incorporate adversarial input detection to block malicious inputs at inference time.

LifecycleData Preparation + 4 moreCategoryTechnical - ML

Data PreparationML Model Engineering+3 more

AML.M0010 - Input Restoration

Input restoration can help remediate adversarial inputs.

LifecycleData Preparation + 3 moreCategoryTechnical - ML

Data PreparationML Model Evaluation+2 more

AML.M0003 - Model Hardening

Hardened models are more robust to adversarial inputs.

LifecycleData Preparation + 1 moreCategoryTechnical - ML

Data PreparationML Model Engineering

AML.M0006 - Use Ensemble Methods

Using an ensemble of models increases the difficulty of crafting effective adversarial data and improves overall robustness.

LifecycleML Model EngineeringCategoryTechnical - ML

ML Model Engineering

Case studies

Examples from public reports and exercises.

3 recordsView all case studies →

Confusing Antimalware Neural Networks

Cloud storage and computations have become popular platforms for deploying ML malware detectors. In such cases, the features for models are built on users' systems and then sent to cybersecurity company servers. The Kaspersky ML research team explored this gray-box scenario and showed that feature knowledge is enough for an adversarial attack on ML models.

They attacked one of Kaspersky's antimalware ML models without white-box access to it and successfully evaded detection for most of the adversarially modified malware files.

Date2021-06-23

exercise

Attack on Machine Translation Services

Machine translation services (such as Google Translate, Bing Translator, and Systran Translate) provide public-facing UIs and APIs. A research group at UC Berkeley utilized these public endpoints to create a replicated model with near-production state-of-the-art translation quality. Beyond demonstrating that IP can be functionally stolen from a black-box system, they used the replicated model to successfully transfer adversarial examples to the real production services. These adversarial inputs successfully cause targeted word flips, vulgar outputs, and dropped sentences on Google Translate and Systran Translate websites.

Date2020-04-30

exercise

ProofPoint Evasion

Proof Pudding (CVE-2019-20634) is a code repository that describes how ML researchers evaded ProofPoint's email protection system by first building a copy-cat email protection ML model, and using the insights to bypass the live system. More specifically, the insights allowed researchers to craft malicious emails that received preferable scores, going undetected by the system. Each word in an email is scored numerically based on multiple variables and if the overall score of the email is too low, ProofPoint will output an error, labeling it as SPAM.

Date2019-09-09

exercise

Source evidence

Original public records and references for this page.

View all sources →

Original source

Original source links

Open the public records and source datasets used for this page.

Repositoryhttps://github.com/mitre-atlas/atlas-data ATLAS.yamlhttps://github.com/mitre-atlas/atlas-data/blob/main/dist/ATLAS.yaml Schemahttps://github.com/mitre-atlas/atlas-data/blob/main/dist/schemas/atlas_output_schema.json

Black-Box Transfer - AI Security Technique

Overview

Technique details

Attack flow

Impact

Mitigations

AML.M0015 - Adversarial Input Detection

AML.M0010 - Input Restoration

AML.M0003 - Model Hardening

AML.M0006 - Use Ensemble Methods

Case studies

Confusing Antimalware Neural Networks

Attack on Machine Translation Services

ProofPoint Evasion

Related risks

Vulnerabilities

Source evidence

Original source links