Erode AI Model Integrity - AI Security Technique

AI Security Technique

Adversaries may degrade the target model's performance with adversarial data inputs to erode confidence in the system over time. This can lead to the victim organization wasting time and money both attempting to fix the system and performing the tasks it was meant to automate by hand.

Overview

A source-backed snapshot of this AI security technique.

Tactics1Attacker goals connected to this method.

Mitigations4Defenses that may help against this attack.

AI risks0Research-backed risks connected to this topic.

Technique details

Identifiers, maturity, and source taxonomy for this technique.

ATLAS ID: AML.T0031
Maturity: realized
Priority score: 92

ATLAS tactics

Impact

Attack flow

How to read the public records connected to this technique.

1. TechniqueRead the ATLAS description and evidence level.

2. TacticsSee which attacker goals this method supports.

3. ExamplesCheck whether public case studies mention it.

4. DefensesReview safeguards mapped by ATLAS.

5. SourcesOpen the original public records and references.

Impact

Why this technique may deserve attention in the current dataset.

Evidence levelrealized
Mapped defenses4 ATLAS mitigation records
Public examples5 linked case study records
Research risks0 related MIT AI Risk records above the confidence threshold
Vulnerabilities0 linked CVE records

Mitigations

Defenses that may help against this attack.

4 recordsView all mitigations →

AML.M0015 - Adversarial Input Detection

Incorporate adversarial input detection into the pipeline before inputs reach the model.

LifecycleData Preparation + 4 moreCategoryTechnical - ML

Data PreparationML Model Engineering+3 more

AML.M0010 - Input Restoration

Preprocessing model inputs can prevent malicious data from going through the machine learning pipeline.

LifecycleData Preparation + 3 moreCategoryTechnical - ML

Data PreparationML Model Evaluation+2 more

AML.M0003 - Model Hardening

Hardened models are less susceptible to integrity attacks.

LifecycleData Preparation + 1 moreCategoryTechnical - ML

Data PreparationML Model Engineering

AML.M0006 - Use Ensemble Methods

Using multiple different models increases robustness to attack.

LifecycleML Model EngineeringCategoryTechnical - ML

ML Model Engineering

Case studies

Examples from public reports and exercises.

5 recordsView all case studies →

Web-Scale Data Poisoning: Split-View Attack

Many recent large-scale datasets are distributed as a list of URLs pointing to individual datapoints. The researchers show that many of these datasets are vulnerable to a "split-view" poisoning attack. The attack exploits the fact that the data viewed when it was initially collected may differ from the data viewed by a user during training. The researchers identify expired and buyable domains that once hosted dataset content, making it possible to replace portions of the dataset with poisoned data. They demonstrate that for 10 popular web-scale datasets, enough of the domains are purchasable to successfully carry out a poisoning attack.

Date2024-06-06

exercise

PoisonGPT

Researchers from Mithril Security demonstrated how to poison an open-source pre-trained large language model (LLM) to return a false fact. They then successfully uploaded the poisoned model back to HuggingFace, the largest publicly-accessible model hub, to illustrate the vulnerability of the LLM supply chain. Users could have downloaded the poisoned model, receiving and spreading poisoned data and misinformation, causing many potential harms.

Date2023-07-01

exercise

Attack on Machine Translation Services

Machine translation services (such as Google Translate, Bing Translator, and Systran Translate) provide public-facing UIs and APIs. A research group at UC Berkeley utilized these public endpoints to create a replicated model with near-production state-of-the-art translation quality. Beyond demonstrating that IP can be functionally stolen from a black-box system, they used the replicated model to successfully transfer adversarial examples to the real production services. These adversarial inputs successfully cause targeted word flips, vulgar outputs, and dropped sentences on Google Translate and Systran Translate websites.

Date2020-04-30

exercise

ClearviewAI Misconfiguration

Clearview AI makes a facial recognition tool that searches publicly available photos for matches. This tool has been used for investigative purposes by law enforcement agencies and other parties.

Clearview AI's source code repository, though password protected, was misconfigured to allow an arbitrary user to register an account. This allowed an external researcher to gain access to a private code repository that contained Clearview AI production credentials, keys to cloud storage buckets containing 70K video samples, and copies of its applications and Slack tokens. With access to training data, a bad actor has the ability to cause an arbitrary misclassification in the deployed model. These kinds of attacks illustrate that any attempt to secure ML system should be on top of "traditional" good cybersecurity hygiene such as locking down the system with least privileges, multi-factor authentication and monitoring and auditing.

Date2020-04-16

incident

Showing 4 of 5

Source evidence

Original public records and references for this page.

View all sources →

Original source

Original source links

Open the public records and source datasets used for this page.

Repositoryhttps://github.com/mitre-atlas/atlas-data ATLAS.yamlhttps://github.com/mitre-atlas/atlas-data/blob/main/dist/ATLAS.yaml Schemahttps://github.com/mitre-atlas/atlas-data/blob/main/dist/schemas/atlas_output_schema.json

Erode AI Model Integrity - AI Security Technique

Overview

Technique details

Attack flow

Impact

Mitigations

Case studies

Related risks

Vulnerabilities

Source evidence

Original source links