Record summary
A quick snapshot of what this page covers.
Attack context
How this AI attack works in practice.
Adversarial data are inputs to an AI model that have been modified such that they cause the adversary's desired effect in the target model. Effects can range from misclassification, to missed detections, to maximizing energy consumption. Typically, the modification is constrained in magnitude or location so that a human still perceives the data as if it were unmodified, but human perceptibility may not always be a concern depending on the adversary's intended effect. For example, an adversarial input for an image classification task is an image the AI model would misclassify, but a human would still recognize as containing the correct class.
Depending on the adversary's knowledge of and access to the target model, the adversary may use different classes of algorithms to develop the adversarial example such as White-Box Optimization, Black-Box Optimization, Black-Box Transfer, or Manual Modification.
The adversary may Verify Attack their approach works if they have white-box or inference API access to the model. This allows the adversary to gain confidence their attack is effective "live" environment where their attack may be noticed. They can then use the attack at a later time to accomplish their goals. An adversary may optimize adversarial examples for Evade AI Model, or to Erode AI Model Integrity.
- ATLAS ID
- AML.T0043
- Priority score
- 64
Mitigations
Defenses that may help against this attack.
AML.M0015 - Adversarial Input Detection
Incorporate adversarial input detection to block malicious inputs at inference time.
AML.M0019 - Control Access to AI Models and Data in Production
Access controls on model APIs can restricts an adversary's access required to generate adversarial data.
AML.M0010 - Input Restoration
Input restoration can help remediate adversarial inputs.
AML.M0003 - Model Hardening
Hardened models are more robust to adversarial inputs.
AML.M0002 - Passive AI Output Obfuscation
Obfuscating model outputs reduces an adversary's ability to generate effective adversarial data.
AML.M0004 - Restrict Number of AI Model Queries
Restricting the number of model queries can reduce an adversary's ability to refine and evaluate adversarial queries.
AML.M0006 - Use Ensemble Methods
Using an ensemble of models increases the difficulty of crafting effective adversarial data and improves overall robustness.
AML.M0008 - Validate AI Model
Validating an AI model against adversarial data can ensure the model is performing as intended and is robust to adversarial inputs.
Case studies
Examples from public reports and exercises.
VirusTotal Poisoning
McAfee Advanced Threat Research noticed an increase in reports of a certain ransomware family that was out of the ordinary. Case investigation revealed that many samples of that particular ransomware family were submitted through a popular virus-sharing platform within a short amount of time. Further investigation revealed that based on string similarity the samples were all equivalent, and based on code similarity they were between 98 and 74 percent similar. Interestingly enough, the compile time was the same for all the samples. After more digging, researchers discovered that someone used 'metame' a metamorphic code manipulating tool to manipulate the original file towards mutant variants. The variants would not always be executable, but are still classified as the same ransomware family.
Source
Where this page information comes from.
Original source
Original source links
Open the public records and source datasets used for this page.