Craft Adversarial Data - AI Security Technique

Overview

A source-backed snapshot of this AI security technique.

Adversarial data are inputs to an AI model that have been modified such that they cause the adversary's desired effect in the target model. Effects can range from misclassification, to missed detections, to maximizing energy consumption. Typically, the modification is constrained in magnitude or location so that a human still perceives the data as if it were unmodified, but human perceptibility may not always be a concern depending on the adversary's intended effect. For example, an adversarial input for an image classification task is an image the AI model would misclassify, but a human would still recognize as containing the correct class.

Depending on the adversary's knowledge of and access to the target model, the adversary may use different classes of algorithms to develop the adversarial example such as White-Box Optimization, Black-Box Optimization, Black-Box Transfer, or Manual Modification.

The adversary may Verify Attack their approach works if they have white-box or inference API access to the model. This allows the adversary to gain confidence their attack is effective "live" environment where their attack may be noticed. They can then use the attack at a later time to accomplish their goals. An adversary may optimize adversarial examples for Evade AI Model, or to Erode AI Model Integrity.

Tactics1Attacker goals connected to this method.

Mitigations8Defenses that may help against this attack.

AI risks0Research-backed risks connected to this topic.

Technique details

Identifiers, maturity, and source taxonomy for this technique.

ATLAS ID: AML.T0043
Maturity: realized
Priority score: 64

ATLAS tactics

AI Attack Staging

Attack flow

How to read the public records connected to this technique.

1. TechniqueRead the ATLAS description and evidence level.

2. TacticsSee which attacker goals this method supports.

3. ExamplesCheck whether public case studies mention it.

4. DefensesReview safeguards mapped by ATLAS.

5. SourcesOpen the original public records and references.

Impact

Why this technique may deserve attention in the current dataset.

Evidence levelrealized
Mapped defenses8 ATLAS mitigation records
Public examples1 linked case study records
Research risks0 related MIT AI Risk records above the confidence threshold
Vulnerabilities0 linked CVE records

Mitigations

Defenses that may help against this attack.

8 recordsView all mitigations →

AML.M0015 - Adversarial Input Detection

Incorporate adversarial input detection to block malicious inputs at inference time.

LifecycleData Preparation + 4 moreCategoryTechnical - ML

Data PreparationML Model Engineering+3 more

AML.M0019 - Control Access to AI Models and Data in Production

Access controls on model APIs can restricts an adversary's access required to generate adversarial data.

LifecycleDeployment + 1 moreCategoryPolicy

DeploymentMonitoring

AML.M0010 - Input Restoration

Input restoration can help remediate adversarial inputs.

LifecycleData Preparation + 3 moreCategoryTechnical - ML

Data PreparationML Model Evaluation+2 more

AML.M0003 - Model Hardening

Hardened models are more robust to adversarial inputs.

LifecycleData Preparation + 1 moreCategoryTechnical - ML

Data PreparationML Model Engineering

Showing 4 of 8

Case studies

Examples from public reports and exercises.

1 recordView all case studies →

VirusTotal Poisoning

McAfee Advanced Threat Research noticed an increase in reports of a certain ransomware family that was out of the ordinary. Case investigation revealed that many samples of that particular ransomware family were submitted through a popular virus-sharing platform within a short amount of time. Further investigation revealed that based on string similarity the samples were all equivalent, and based on code similarity they were between 98 and 74 percent similar. Interestingly enough, the compile time was the same for all the samples. After more digging, researchers discovered that someone used 'metame' a metamorphic code manipulating tool to manipulate the original file towards mutant variants. The variants would not always be executable, but are still classified as the same ransomware family.

Date2020-01-01

incident

Related risks

Research-backed risks connected to this topic.

View all risks →

No related AI risks. No research risk is connected to this topic in the current data.

Vulnerabilities

Known software flaws linked to this context.

View all vulnerabilities →

No related vulnerabilities. No software flaw is connected to this attack in the current data.

Source evidence

Original public records and references for this page.

View all sources →

Original source

Original source links

Open the public records and source datasets used for this page.

Repositoryhttps://github.com/mitre-atlas/atlas-data ATLAS.yamlhttps://github.com/mitre-atlas/atlas-data/blob/main/dist/ATLAS.yaml Schemahttps://github.com/mitre-atlas/atlas-data/blob/main/dist/schemas/atlas_output_schema.json