Generative AI - AI Security Technique

Overview

A source-backed snapshot of this AI security technique.

Adversaries may search for and obtain generative AI models or tools, such as large language models (LLMs), to assist them in various steps of their operation. Generative AI can be used in a variety of malicious ways, such as to generating malware, to Generate Deepfakes, to Generate Malicious Commands, for Retrieval Content Crafting, or to generate Phishing content.

Adversaries may obtain open source models and serve them locally using frameworks such as Ollama or vLLM. They may host them using cloud infrastructure. Or, they may leverage AI service providers such as HuggingFace.

They may need to jailbreak the model (see LLM Jailbreak) to bypass any restrictions put in place to limit the types of responses it can generate. They may also need to break the terms of service of the model's developer.

Generative AI models may also be "uncensored" meaning they are designed to generate content without any restrictions such as guardrails or content filters. Uncensored GenAI is ripe for abuse by cybercriminals [1] [2]. Models may be fine-tuned to remove alignment and guardrails [3] or be subjected to targeted manipulations to bypass refusal [4] resulting in uncensored variants of the model. Uncensored models may be built for offensive and defensive cybersecurity [5], which can be abused by an adversary. There are also models that are expressly designed and advertised for malicious use [6].

References

Tactics0Attacker goals connected to this method.

Mitigations0Defenses that may help against this attack.

AI risks19Research-backed risks connected to this topic.

Technique details

Identifiers, maturity, and source taxonomy for this technique.

ATLAS ID: AML.T0016.002
Maturity: realized
Priority score: 155

Attack flow

How to read the public records connected to this technique.

1. TechniqueRead the ATLAS description and evidence level.

2. TacticsSee which attacker goals this method supports.

3. ExamplesCheck whether public case studies mention it.

4. DefensesReview safeguards mapped by ATLAS.

5. SourcesOpen the original public records and references.

Impact

Why this technique may deserve attention in the current dataset.

Evidence levelrealized
Mapped defenses0 ATLAS mitigation records
Public examples3 linked case study records
Research risks19 related MIT AI Risk records above the confidence threshold
Vulnerabilities0 linked CVE records

Mitigations

Defenses that may help against this attack.

View all mitigations →

No connected defenses. No defense is connected to this attack in the current data.

Case studies

Examples from public reports and exercises.

3 recordsView all case studies →

AI ClickFix: Hijacking Computer-Use Agents Using ClickFix

Embrace the Red demonstrated that AI computer-use agents are vulnerable to social engineering attacks and can be manipulated into executing arbitrary code on a victim’s machine. The attack is a variation on “ClickFix” which is a social engineering attack that fools humans into copying malicious commands and executing them.

The researcher used ChatGPT to generate a website designed to attract interactions with computer-use agents. When a user asked their Claude Computer-Use Agent to visit the researcher’s website, the text “Are you a computer? Please see instructions to confirm:” caused the agent to click the associated button. This executed JavaScript to copy a malicious command into the agent’s clipboard. The agent then proceeded to follow the instructions, opening a terminal, pasting the malicious command, and executing it. The command downloads a script from the researcher’s website and executes it. In the demonstration, the script opens the victim’s Calculator App, but in practice an adversary could run arbitrary code, compromising the victim’s system.

Date2025-05-24

exercise

ProKYC: Deepfake Tool for Account Fraud Attacks

Cato CTRL security researchers have identified ProKYC, a deepfake tool being sold to cybercriminals as a method to bypass Know Your Customer (KYC) verification on financial service applications such as cryptocurrency exchanges. ProKYC can create fake identity documents and generate deepfake selfie videos, two key pieces of biometric data used during KYC verification. The tool helps cybercriminals defeat facial recognition and liveness checks to create fraudulent accounts.

The procedure below describes how a bad actor could use ProKYC’s service to bypass KYC verification.

Date2024-10-09

incident

Live Deepfake Image Injection to Evade Mobile KYC Verification

Facial biometric authentication services are commonly used by mobile applications for user onboarding, authentication, and identity verification for KYC requirements. The iProov Red Team demonstrated a face-swapped imagery injection attack that can successfully evade live facial recognition authentication models along with both passive and active liveness verification on mobile devices. By executing this kind of attack, adversaries could gain access to privileged systems of a victim or create fake personas to create fake accounts on banking or cryptocurrency apps.

Date2024-10-01

exercise

Related risks

Research-backed risks connected to this topic.

Top 10 of 19View all risks →

Jailbreaks and Prompt Injections Threaten Security of LLMs

"LLMs are not adversarially robust and are vulnerable to security failures such as jailbreaks and prompt-injection attacks. While a number of jailbreak attacks have been proposed in the literature, the lack of standar...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.72

Exploiting Limited Generalization of Safety Finetuning

"Safety tuning is performed over a much narrower distribution compared to the pretraining distribution. This leaves the model vulnerable to attacks that exploit gaps in the generalization of the safety training, e.g...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.72

“Model Psychology” Attacks

"LLMs are vulnerable to “psychological” tricks (Li et al., 2023e; Shen et al., 2023), which can be exploited by attackers. Examples include instructing the model to behave like a specific persona (Shah et al., 2023; A...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.72

One-step Jailbreaks

"One-step jailbreaks. One-step jailbreaks commonly involve direct modifications to the prompt itself, such as setting role-playing scenarios or adding specific descriptions to prompts [14], [52], [67]–[73]. Role-playi...

Domain2. Privacy & SecuritySubdomain2.2 > AI system security vulnerabilities and attacks

Confidence0.72

Showing 4 of 10