Search Open AI Vulnerability Analysis - AI Security Technique

Overview

A source-backed snapshot of this AI security technique.

Much like the Search Open Technical Databases, there is often ample research available on the vulnerabilities of common AI models. Once a target has been identified, an adversary will likely try to identify any pre-existing work that has been done for this class of models. This will include not only reading academic papers that may identify the particulars of a successful attack, but also identifying pre-existing implementations of those attacks. The adversary may obtain Adversarial AI Attack Implementations or develop their own Adversarial AI Attacks if necessary.

Tactics1Attacker goals connected to this method.

Mitigations0Defenses that may help against this attack.

AI risks0Research-backed risks connected to this topic.

Technique details

Identifiers, maturity, and source taxonomy for this technique.

ATLAS ID: AML.T0001
Maturity: demonstrated
Priority score: 40

ATLAS tactics

Reconnaissance

Attack flow

How to read the public records connected to this technique.

1. TechniqueRead the ATLAS description and evidence level.

2. TacticsSee which attacker goals this method supports.

3. ExamplesCheck whether public case studies mention it.

4. DefensesReview safeguards mapped by ATLAS.

5. SourcesOpen the original public records and references.

Impact

Why this technique may deserve attention in the current dataset.

Evidence leveldemonstrated
Mapped defenses0 ATLAS mitigation records
Public examples2 linked case study records
Research risks0 related MIT AI Risk records above the confidence threshold
Vulnerabilities0 linked CVE records

Mitigations

Defenses that may help against this attack.

View all mitigations →

No connected defenses. No defense is connected to this attack in the current data.

Case studies

Examples from public reports and exercises.

2 recordsView all case studies →

Achieving Code Execution in MathGPT via Prompt Injection

The publicly available Streamlit application MathGPT uses GPT-3, a large language model (LLM), to answer user-generated math questions.

Recent studies and experiments have shown that LLMs such as GPT-3 show poor performance when it comes to performing exact math directly^[1]^[2]. However, they can produce more accurate answers when asked to generate executable code that solves the question at hand. In the MathGPT application, GPT-3 is used to convert the user's natural language question into Python code that is then executed. After computation, the executed code and the answer are displayed to the user.

Some LLMs can be vulnerable to prompt injection attacks, where malicious user inputs cause the models to perform unexpected behavior^[3]^[4]. In this incident, the actor explored several prompt-override avenues, producing code that eventually led to the actor gaining access to the application host system's environment variables and the application's GPT-3 API key, as well as executing a denial of service attack. As a result, the actor could have exhausted the application's API query budget or brought down the application.

After disclosing the attack vectors and their results to the MathGPT and Streamlit teams, the teams took steps to mitigate the vulnerabilities, filtering on select prompts and rotating the API key.

References

Date2023-01-28

exercise

Confusing Antimalware Neural Networks

Cloud storage and computations have become popular platforms for deploying ML malware detectors. In such cases, the features for models are built on users' systems and then sent to cybersecurity company servers. The Kaspersky ML research team explored this gray-box scenario and showed that feature knowledge is enough for an adversarial attack on ML models.

They attacked one of Kaspersky's antimalware ML models without white-box access to it and successfully evaded detection for most of the adversarially modified malware files.

Date2021-06-23

exercise

Related risks

Research-backed risks connected to this topic.

View all risks →

No related AI risks. No research risk is connected to this topic in the current data.

Vulnerabilities

Known software flaws linked to this context.

View all vulnerabilities →

No related vulnerabilities. No software flaw is connected to this attack in the current data.

Source evidence

Original public records and references for this page.

View all sources →

Original source

Original source links

Open the public records and source datasets used for this page.

Repositoryhttps://github.com/mitre-atlas/atlas-data ATLAS.yamlhttps://github.com/mitre-atlas/atlas-data/blob/main/dist/ATLAS.yaml Schemahttps://github.com/mitre-atlas/atlas-data/blob/main/dist/schemas/atlas_output_schema.json