Security - Robustness - PromptRiskDB

Record summary

A quick snapshot of what this page covers.

Techniques37Attack methods connected to this risk.

Mitigations25Defenses that may help with related attacks.

Domain2. Privacy & SecurityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

While AI safety focuses on threats emanating from generative AI systems, security centers on threats posed to these systems. The most extensively discussed issue in this context are jailbreaking risks, which involve techniques like prompt injection or visual adversarial examples designed to circumvent safety guardrails governing model behavior. Sources delve into various jailbreaking methods, such as role play or reverse exposure. Similarly, implementing backdoors or using model poisoning techniques bypass safety guardrails as well. Other security concerns pertain to model or prompt thefts.

Domain2. Privacy & Security

Subdomain2.2 > AI system security vulnerabilities and attacks

Entity1 - Human

Intent1 - Intentional

Timing3 - Other

CategorySecurity - Robustness

Subcategoryn/a

Related techniques

Attack methods connected to this risk.

Suggested mitigations

Defenses that may help with related attacks.

Source

Research source for this risk, when available.

Included resource

Mapping the Ethics of Generative AI: A Comprehensive Scoping Review

AuthorsHagendorffYear2024TypePreprint

DOI10.48550/arXiv.2402.08323 URLhttps://arxiv.org/abs/2402.08323

Original source

MIT AI Risk Repository

Open the public repository used for AI risk records and taxonomy fields.

Repositoryhttps://airisk.mit.edu/

Security - Robustness

Record summary

Risk profile

Suggested mitigations

Control Access to AI Models and Data in Production

Generative AI Guardrails

Generative AI Guidelines

Generative AI Model Alignment

AI Telemetry Logging

Input and Output Validation for AI Agent Components

Privileged AI Agent Permissions Configuration

Single-User AI Agent Permissions Configuration

AI Agent Tools Permissions Configuration

Human In-the-Loop for AI Agent Actions

Restrict AI Agent Tool Invocation on Untrusted Data

Segmentation of AI Agent Components

Memory Hardening

Limit Model Artifact Release

Control Access to AI Models and Data at Rest

Sanitize Training Data

Validate AI Model

AI Bill of Materials

Maintain AI Dataset Provenance

Code Signing

Verify AI Artifacts

Model Hardening

Use Ensemble Methods

Input Restoration

Adversarial Input Detection

Source

Mapping the Ethics of Generative AI: A Comprehensive Scoping Review

MIT AI Risk Repository