AI Security Technique

Manipulate AI Model - AI Security Technique

Adversaries may directly manipulate an AI model to change its behavior or introduce malicious code. Manipulating a model gives the adversary a persistent change in the system. This can include poisoning the model by changing its weights, modifying the model architecture to change its behavior, and embedding malware which may be executed when the model is loaded.

View mitigations Read context

AI Security TechniquerealizedAI Attack StagingPersistence

Record summary

A quick snapshot of what this page covers.

Tactics2Attacker goals connected to this method.

Mitigations3Defenses that may help against this attack.

AI risks12Research-backed risks connected to this topic.

Attack context

How this AI attack works in practice.

ATLAS ID: AML.T0018
Priority score: 99

Maturity: realized

AI Attack StagingPersistence

Mitigations

Defenses that may help against this attack.

AML.M0013 - Code Signing

Deployment

LifecycleDeploymentCategoryTechnical - Cyber

Code signing provides a guarantee that the model has not been manipulated after signing took place.

AML.M0005 - Control Access to AI Models and Data at Rest

Business and Data UnderstandingData Preparation+2 more

LifecycleBusiness and Data Understanding + 3 moreCategoryPolicy

Access controls can prevent tampering with AI artifacts and prevent unauthorized modification.

AML.M0008 - Validate AI Model

ML Model EvaluationMonitoring and Maintenance

LifecycleML Model Evaluation + 1 moreCategoryTechnical - ML

Validating an AI model against a wide range of adversarial inputs can help increase confidence that the model has not been manipulated.

Case studies

Examples from public reports and exercises.

No case studies found. No public example is connected to this attack in the current data.

Related risks

Research-backed risks connected to this topic.

Fine-tuning related (Fine-tuning dataset poisoning)

Confidence: 0.75

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

"A deployer can poison the dataset used during the fine-tuning process [98] to induce specific, often malicious, behaviors in a model. This can be performed without having access to the model’s weights. This poisoning...

Data poisoning

Confidence: 0.75

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

"A type of adversarial attack where an adversary or malicious insider injects intentionally corrupted, false, misleading, or incorrect samples into the training or fine-tuning datasets."

Poisoning

Confidence: 0.75

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

"Data Poisoning involves deliberately corrupting a model’s training dataset to introduce vulnerabilities, derail its learning process, or cause it to make incorrect predictions (Carlini et al., 2023). For example, the...

Jailbreak in LLM Malicious Use - Poisoning Training Data

Confidence: 0.75

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

"In the data collecting and pre-training phase, malicious adversaries can Jailbreak LLMs through poisoning their training data to make the model to output harmful content."

Data-related (Insufficient quality control in data collection process)

Confidence: 0.75

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

"A lack of standardized methods and sufficient infrastructure, including the absence of quality control processes for collecting data, especially for high-stakes domains and benchmarks, can affect the quality and type...

Vulnerability to Poisoning and Backdoors

Confidence: 0.73

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

"The previous section explored jailbreaks and other forms of adversarial prompts as ways to elicit harmful capabilities acquired during pretraining. These methods make no assumptions about the training data. On the ot...

Vulnerabilities arising from additional modalities in multimodal models

Confidence: 0.73

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

"Additional modalities can introduce new attack vectors in multimodal models as well as expand the scope of the previous attacks, ranging from jailbreaking to poisoning [13]. Typically, different modalities have diffe...

Adversarial AI (General)

Confidence: 0.73

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

"Adversarial AI refers to a class of attacks that exploit vulnerabilities in machine-learning (ML) models. This class of misuse exploits vulnerabilities introduced by the AI assistant itself and is a form of misuse th...

Poisoning Attacks

Confidence: 0.72

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

fool the model by manipulating the training data, usually performed on classification models

Fine-tuning related (Poisoning models during instruction tuning)

Confidence: 0.72

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

"AI models can be poisoned during instruction tuning when models are tuned using pairs of instructions and desired outputs. Poisoning in instruction tuning can be achieved with a lower number of compromised samples, a...

Security - Robustness

Confidence: 0.72

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

While AI safety focuses on threats emanating from generative AI systems, security centers on threats posed to these systems. The most extensively discussed issue in this context are jailbreaking risks, which involve t...

Data-related (Difficulty filtering large web scrapes or large scale web datasets)

Confidence: 0.68

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

"A large scale “scraping” of web data for training datasets increases vulnerability to data poisoning, backdoor attacks, and the inclusion of inaccurate or toxic data [76, 28, 48]. With a large dataset, filtering out...

Source

Where this page information comes from.

Original source

Original source links

Open the public records and source datasets used for this page.

Repositoryhttps://github.com/mitre-atlas/atlas-data ATLAS.yamlhttps://github.com/mitre-atlas/atlas-data/blob/main/dist/ATLAS.yaml Schemahttps://github.com/mitre-atlas/atlas-data/blob/main/dist/schemas/atlas_output_schema.json