Record summary
A quick snapshot of what this page covers.
Attack context
How this AI attack works in practice.
Adversaries can verify the efficacy of their attack via an inference API or access to an offline copy of the target model. This gives the adversary confidence that their approach works and allows them to carry out the attack at a later time of their choosing. The adversary may verify the attack once but use it against many edge devices running copies of the target model. The adversary may verify their attack digitally, then deploy it in the Physical Environment Access at a later time. Verifying the attack may be hard to detect since the adversary can use a minimal number of queries or an offline copy of the model.
- ATLAS ID
- AML.T0042
- Priority score
- 102
Mitigations
Defenses that may help against this attack.
AML.M0005 - Control Access to AI Models and Data at Rest
Access controls on models at rest can prevent an adversary's ability to verify attack efficacy.
AML.M0019 - Control Access to AI Models and Data in Production
Use access controls in production to prevent adversary's ability to verify attack efficacy.
AML.M0002 - Passive AI Output Obfuscation
Obfuscating model outputs reduces an adversary's ability to verify the efficacy of an attack.
AML.M0004 - Restrict Number of AI Model Queries
Restricting the number of queries to the model decreases an adversary's ability to verify the efficacy of an attack.
Case studies
Examples from public reports and exercises.
PoisonGPT
Researchers from Mithril Security demonstrated how to poison an open-source pre-trained large language model (LLM) to return a false fact. They then successfully uploaded the poisoned model back to HuggingFace, the largest publicly-accessible model hub, to illustrate the vulnerability of the LLM supply chain. Users could have downloaded the poisoned model, receiving and spreading poisoned data and misinformation, causing many potential harms.
Achieving Code Execution in MathGPT via Prompt Injection
The publicly available Streamlit application MathGPT uses GPT-3, a large language model (LLM), to answer user-generated math questions.
Recent studies and experiments have shown that LLMs such as GPT-3 show poor performance when it comes to performing exact math directly[<sup>\[1\]</sup>][1][<sup>\[2\]</sup>][2]. However, they can produce more accurate answers when asked to generate executable code that solves the question at hand. In the MathGPT application, GPT-3 is used to convert the user's natural language question into Python code that is then executed. After computation, the executed code and the answer are displayed to the user.
Some LLMs can be vulnerable to prompt injection attacks, where malicious user inputs cause the models to perform unexpected behavior[<sup>\[3\]</sup>][3][<sup>\[4\]</sup>][4]. In this incident, the actor explored several prompt-override avenues, producing code that eventually led to the actor gaining access to the application host system's environment variables and the application's GPT-3 API key, as well as executing a denial of service attack. As a result, the actor could have exhausted the application's API query budget or brought down the application.
After disclosing the attack vectors and their results to the MathGPT and Streamlit teams, the teams took steps to mitigate the vulnerabilities, filtering on select prompts and rotating the API key.
References
Confusing Antimalware Neural Networks
Cloud storage and computations have become popular platforms for deploying ML malware detectors. In such cases, the features for models are built on users' systems and then sent to cybersecurity company servers. The Kaspersky ML research team explored this gray-box scenario and showed that feature knowledge is enough for an adversarial attack on ML models.
They attacked one of Kaspersky's antimalware ML models without white-box access to it and successfully evaded detection for most of the adversarially modified malware files.
Backdoor Attack on Deep Learning Models in Mobile Apps
Deep learning models are increasingly used in mobile applications as critical components. Researchers from Microsoft Research demonstrated that many deep learning models deployed in mobile apps are vulnerable to backdoor attacks via "neural payload injection." They conducted an empirical study on real-world mobile deep learning apps collected from Google Play. They identified 54 apps that were vulnerable to attack, including popular security and safety critical applications used for cash recognition, parental control, face authentication, and financial services.
Botnet Domain Generation Algorithm (DGA) Detection Evasion
The Palo Alto Networks Security AI research team was able to bypass a Convolutional Neural Network based botnet Domain Generation Algorithm (DGA) detector using a generic domain name mutation technique. It is a generic domain mutation technique which can evade most ML-based DGA detection modules. The generic mutation technique evades most ML-based DGA detection modules DGA and can be used to test the effectiveness and robustness of all DGA detection methods developed by security companies in the industry before they is deployed to the production environment.
Evasion of Deep Learning Detector for Malware C&C Traffic
The Palo Alto Networks Security AI research team tested a deep learning model for malware command and control (C&C) traffic detection in HTTP traffic. Based on the publicly available paper by Le et al., we built a model that was trained on a similar dataset as our production model and had similar performance. Then we crafted adversarial samples, queried the model, and adjusted the adversarial sample accordingly until the model was evaded.
Microsoft Azure Service Disruption
The Microsoft AI Red Team performed a red team exercise on an internal Azure service with the intention of disrupting its service. This operation had a combination of traditional ATT&CK enterprise techniques such as finding valid account, and exfiltrating data -- all interleaved with adversarial ML specific steps such as offline and online evasion examples.
Source
Where this page information comes from.
Original source
Original source links
Open the public records and source datasets used for this page.