APromptRiskDBThreat intelligence atlas
AI Security Technique

Verify Attack - AI Security Technique

Adversaries can verify the efficacy of their attack via an inference API or access to an offline copy of the target model. This gives the adversary confidence that their approach works and allows them to carry out the attack at a later time of their choosing. The adversary may verify the attack once but use it against many edge devices running copies of the target model. The adversary may verify their attack digit...

AI Security TechniquedemonstratedAI Attack Staging

Record summary

A quick snapshot of what this page covers.

Tactics1Attacker goals connected to this method.
Mitigations4Defenses that may help against this attack.
AI risks0Research-backed risks connected to this topic.

Attack context

How this AI attack works in practice.

Adversaries can verify the efficacy of their attack via an inference API or access to an offline copy of the target model. This gives the adversary confidence that their approach works and allows them to carry out the attack at a later time of their choosing. The adversary may verify the attack once but use it against many edge devices running copies of the target model. The adversary may verify their attack digitally, then deploy it in the Physical Environment Access at a later time. Verifying the attack may be hard to detect since the adversary can use a minimal number of queries or an offline copy of the model.

ATLAS ID
AML.T0042
Priority score
102
Maturity: demonstrated
AI Attack Staging

Mitigations

Defenses that may help against this attack.

AML.M0002 - Passive AI Output Obfuscation

DeploymentML Model Evaluation
LifecycleDeployment + 1 moreCategoryTechnical - ML

Obfuscating model outputs reduces an adversary's ability to verify the efficacy of an attack.

AML.M0004 - Restrict Number of AI Model Queries

Business and Data UnderstandingDeployment+1 more
LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - Cyber

Restricting the number of queries to the model decreases an adversary's ability to verify the efficacy of an attack.

Case studies

Examples from public reports and exercises.

PoisonGPT

exercise
Date2023-07-01

Researchers from Mithril Security demonstrated how to poison an open-source pre-trained large language model (LLM) to return a false fact. They then successfully uploaded the poisoned model back to HuggingFace, the largest publicly-accessible model hub, to illustrate the vulnerability of the LLM supply chain. Users could have downloaded the poisoned model, receiving and spreading poisoned data and misinformation, causing many potential harms.

Achieving Code Execution in MathGPT via Prompt Injection

exercise
Date2023-01-28

The publicly available Streamlit application MathGPT uses GPT-3, a large language model (LLM), to answer user-generated math questions.

Recent studies and experiments have shown that LLMs such as GPT-3 show poor performance when it comes to performing exact math directly[<sup>\[1\]</sup>][1][<sup>\[2\]</sup>][2]. However, they can produce more accurate answers when asked to generate executable code that solves the question at hand. In the MathGPT application, GPT-3 is used to convert the user's natural language question into Python code that is then executed. After computation, the executed code and the answer are displayed to the user.

Some LLMs can be vulnerable to prompt injection attacks, where malicious user inputs cause the models to perform unexpected behavior[<sup>\[3\]</sup>][3][<sup>\[4\]</sup>][4]. In this incident, the actor explored several prompt-override avenues, producing code that eventually led to the actor gaining access to the application host system's environment variables and the application's GPT-3 API key, as well as executing a denial of service attack. As a result, the actor could have exhausted the application's API query budget or brought down the application.

After disclosing the attack vectors and their results to the MathGPT and Streamlit teams, the teams took steps to mitigate the vulnerabilities, filtering on select prompts and rotating the API key.

References

  1. [1] https://arxiv.org/abs/2103.03874
  2. [2] https://arxiv.org/abs/2110.14168
  3. [3] https://lspace.swyx.io/p/reverse-prompt-eng
  4. [4] https://research.nccgroup.com/2022/12/05/exploring-prompt-injection-attacks/

Confusing Antimalware Neural Networks

exercise
Date2021-06-23

Cloud storage and computations have become popular platforms for deploying ML malware detectors. In such cases, the features for models are built on users' systems and then sent to cybersecurity company servers. The Kaspersky ML research team explored this gray-box scenario and showed that feature knowledge is enough for an adversarial attack on ML models.

They attacked one of Kaspersky's antimalware ML models without white-box access to it and successfully evaded detection for most of the adversarially modified malware files.

Backdoor Attack on Deep Learning Models in Mobile Apps

exercise
Date2021-01-18

Deep learning models are increasingly used in mobile applications as critical components. Researchers from Microsoft Research demonstrated that many deep learning models deployed in mobile apps are vulnerable to backdoor attacks via "neural payload injection." They conducted an empirical study on real-world mobile deep learning apps collected from Google Play. They identified 54 apps that were vulnerable to attack, including popular security and safety critical applications used for cash recognition, parental control, face authentication, and financial services.

Botnet Domain Generation Algorithm (DGA) Detection Evasion

exercise
Date2020-01-01

The Palo Alto Networks Security AI research team was able to bypass a Convolutional Neural Network based botnet Domain Generation Algorithm (DGA) detector using a generic domain name mutation technique. It is a generic domain mutation technique which can evade most ML-based DGA detection modules. The generic mutation technique evades most ML-based DGA detection modules DGA and can be used to test the effectiveness and robustness of all DGA detection methods developed by security companies in the industry before they is deployed to the production environment.

Evasion of Deep Learning Detector for Malware C&C Traffic

exercise
Date2020-01-01

The Palo Alto Networks Security AI research team tested a deep learning model for malware command and control (C&C) traffic detection in HTTP traffic. Based on the publicly available paper by Le et al., we built a model that was trained on a similar dataset as our production model and had similar performance. Then we crafted adversarial samples, queried the model, and adjusted the adversarial sample accordingly until the model was evaded.

Microsoft Azure Service Disruption

exercise
Date2020-01-01

The Microsoft AI Red Team performed a red team exercise on an internal Azure service with the intention of disrupting its service. This operation had a combination of traditional ATT&CK enterprise techniques such as finding valid account, and exfiltrating data -- all interleaved with adversarial ML specific steps such as offline and online evasion examples.

Source

Where this page information comes from.