Verify Attack - AI Security Technique

Overview

A source-backed snapshot of this AI security technique.

Adversaries can verify the efficacy of their attack via an inference API or access to an offline copy of the target model. This gives the adversary confidence that their approach works and allows them to carry out the attack at a later time of their choosing. The adversary may verify the attack once but use it against many edge devices running copies of the target model. The adversary may verify their attack digitally, then deploy it in the Physical Environment Access at a later time. Verifying the attack may be hard to detect since the adversary can use a minimal number of queries or an offline copy of the model.

Tactics1Attacker goals connected to this method.

Mitigations4Defenses that may help against this attack.

AI risks0Research-backed risks connected to this topic.

Technique details

Identifiers, maturity, and source taxonomy for this technique.

ATLAS ID: AML.T0042
Maturity: demonstrated
Priority score: 102

ATLAS tactics

AI Attack Staging

Attack flow

How to read the public records connected to this technique.

1. TechniqueRead the ATLAS description and evidence level.

2. TacticsSee which attacker goals this method supports.

3. ExamplesCheck whether public case studies mention it.

4. DefensesReview safeguards mapped by ATLAS.

5. SourcesOpen the original public records and references.

Impact

Why this technique may deserve attention in the current dataset.

Evidence leveldemonstrated
Mapped defenses4 ATLAS mitigation records
Public examples7 linked case study records
Research risks0 related MIT AI Risk records above the confidence threshold
Vulnerabilities0 linked CVE records

Mitigations

Defenses that may help against this attack.

4 recordsView all mitigations →

AML.M0005 - Control Access to AI Models and Data at Rest

Access controls on models at rest can prevent an adversary's ability to verify attack efficacy.

LifecycleBusiness and Data Understanding + 3 moreCategoryPolicy

B&D UnderstandingData Preparation+2 more

AML.M0019 - Control Access to AI Models and Data in Production

Use access controls in production to prevent adversary's ability to verify attack efficacy.

LifecycleDeployment + 1 moreCategoryPolicy

DeploymentMonitoring

AML.M0002 - Passive AI Output Obfuscation

Obfuscating model outputs reduces an adversary's ability to verify the efficacy of an attack.

LifecycleDeployment + 1 moreCategoryTechnical - ML

DeploymentML Model Evaluation

AML.M0004 - Restrict Number of AI Model Queries

Restricting the number of queries to the model decreases an adversary's ability to verify the efficacy of an attack.

LifecycleBusiness and Data Understanding + 2 moreCategoryTechnical - Cyber

B&D UnderstandingDeployment+1 more

Case studies

Examples from public reports and exercises.

7 recordsView all case studies →

PoisonGPT

Researchers from Mithril Security demonstrated how to poison an open-source pre-trained large language model (LLM) to return a false fact. They then successfully uploaded the poisoned model back to HuggingFace, the largest publicly-accessible model hub, to illustrate the vulnerability of the LLM supply chain. Users could have downloaded the poisoned model, receiving and spreading poisoned data and misinformation, causing many potential harms.

Date2023-07-01

exercise

Achieving Code Execution in MathGPT via Prompt Injection

The publicly available Streamlit application MathGPT uses GPT-3, a large language model (LLM), to answer user-generated math questions.

Recent studies and experiments have shown that LLMs such as GPT-3 show poor performance when it comes to performing exact math directly^[1]^[2]. However, they can produce more accurate answers when asked to generate executable code that solves the question at hand. In the MathGPT application, GPT-3 is used to convert the user's natural language question into Python code that is then executed. After computation, the executed code and the answer are displayed to the user.

Some LLMs can be vulnerable to prompt injection attacks, where malicious user inputs cause the models to perform unexpected behavior^[3]^[4]. In this incident, the actor explored several prompt-override avenues, producing code that eventually led to the actor gaining access to the application host system's environment variables and the application's GPT-3 API key, as well as executing a denial of service attack. As a result, the actor could have exhausted the application's API query budget or brought down the application.

After disclosing the attack vectors and their results to the MathGPT and Streamlit teams, the teams took steps to mitigate the vulnerabilities, filtering on select prompts and rotating the API key.

References

Date2023-01-28

exercise

Confusing Antimalware Neural Networks

Cloud storage and computations have become popular platforms for deploying ML malware detectors. In such cases, the features for models are built on users' systems and then sent to cybersecurity company servers. The Kaspersky ML research team explored this gray-box scenario and showed that feature knowledge is enough for an adversarial attack on ML models.

They attacked one of Kaspersky's antimalware ML models without white-box access to it and successfully evaded detection for most of the adversarially modified malware files.

Date2021-06-23

exercise

Backdoor Attack on Deep Learning Models in Mobile Apps

Deep learning models are increasingly used in mobile applications as critical components. Researchers from Microsoft Research demonstrated that many deep learning models deployed in mobile apps are vulnerable to backdoor attacks via "neural payload injection." They conducted an empirical study on real-world mobile deep learning apps collected from Google Play. They identified 54 apps that were vulnerable to attack, including popular security and safety critical applications used for cash recognition, parental control, face authentication, and financial services.

Date2021-01-18

exercise

Showing 4 of 7