APromptRiskDBThreat intelligence atlas
AI Security Technique

Craft Adversarial Data - AI Security Technique

Adversarial data are inputs to an AI model that have been modified such that they cause the adversary's desired effect in the target model. Effects can range from misclassification, to missed detections, to maximizing energy consumption. Typically, the modification is constrained in magnitude or location so that a human still perceives the data as if it were unmodified, but human perceptibility may not always be a...

AI Security TechniquerealizedAI Attack Staging

Record summary

A quick snapshot of what this page covers.

Tactics1Attacker goals connected to this method.
Mitigations8Defenses that may help against this attack.
AI risks0Research-backed risks connected to this topic.

Attack context

How this AI attack works in practice.

Adversarial data are inputs to an AI model that have been modified such that they cause the adversary's desired effect in the target model. Effects can range from misclassification, to missed detections, to maximizing energy consumption. Typically, the modification is constrained in magnitude or location so that a human still perceives the data as if it were unmodified, but human perceptibility may not always be a concern depending on the adversary's intended effect. For example, an adversarial input for an image classification task is an image the AI model would misclassify, but a human would still recognize as containing the correct class.

Depending on the adversary's knowledge of and access to the target model, the adversary may use different classes of algorithms to develop the adversarial example such as White-Box Optimization, Black-Box Optimization, Black-Box Transfer, or Manual Modification.

The adversary may Verify Attack their approach works if they have white-box or inference API access to the model. This allows the adversary to gain confidence their attack is effective "live" environment where their attack may be noticed. They can then use the attack at a later time to accomplish their goals. An adversary may optimize adversarial examples for Evade AI Model, or to Erode AI Model Integrity.

ATLAS ID
AML.T0043
Priority score
64
Maturity: realized
AI Attack Staging

Mitigations

Defenses that may help against this attack.

AML.M0015 - Adversarial Input Detection

Data PreparationML Model Engineering+3 more
LifecycleData Preparation + 4 moreCategoryTechnical - ML

Incorporate adversarial input detection to block malicious inputs at inference time.

AML.M0010 - Input Restoration

Data PreparationML Model Evaluation+2 more
LifecycleData Preparation + 3 moreCategoryTechnical - ML

Input restoration can help remediate adversarial inputs.

AML.M0003 - Model Hardening

Data PreparationML Model Engineering
LifecycleData Preparation + 1 moreCategoryTechnical - ML

Hardened models are more robust to adversarial inputs.

AML.M0002 - Passive AI Output Obfuscation

DeploymentML Model Evaluation
LifecycleDeployment + 1 moreCategoryTechnical - ML

Obfuscating model outputs reduces an adversary's ability to generate effective adversarial data.

AML.M0004 - Restrict Number of AI Model Queries

Business and Data UnderstandingDeployment+1 more
LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - Cyber

Restricting the number of model queries can reduce an adversary's ability to refine and evaluate adversarial queries.

AML.M0006 - Use Ensemble Methods

ML Model Engineering
LifecycleML Model EngineeringCategoryTechnical - ML

Using an ensemble of models increases the difficulty of crafting effective adversarial data and improves overall robustness.

AML.M0008 - Validate AI Model

ML Model EvaluationMonitoring and Maintenance
LifecycleML Model Evaluation + 1 moreCategoryTechnical - ML

Validating an AI model against adversarial data can ensure the model is performing as intended and is robust to adversarial inputs.

Case studies

Examples from public reports and exercises.

VirusTotal Poisoning

incident
Date2020-01-01

McAfee Advanced Threat Research noticed an increase in reports of a certain ransomware family that was out of the ordinary. Case investigation revealed that many samples of that particular ransomware family were submitted through a popular virus-sharing platform within a short amount of time. Further investigation revealed that based on string similarity the samples were all equivalent, and based on code similarity they were between 98 and 74 percent similar. Interestingly enough, the compile time was the same for all the samples. After more digging, researchers discovered that someone used 'metame' a metamorphic code manipulating tool to manipulate the original file towards mutant variants. The variants would not always be executable, but are still classified as the same ransomware family.

Source

Where this page information comes from.