PromptRiskDBThreat intelligence atlas
AI Risk

Adversarial Optimization:

"Jailbreak attacks can be discovered by performing manual or auto- mated adversarial optimization against a proxy objective that is noisily correlated with the success of a jailbreak. These are mostly gradient-based attacks (Zou et al., 2023b; Shin et al., 2020) as described in the previous two challenges, but gradient-free methods also exist (Prasad et al., 2022; Deng et al., 2022; Lapid et al., 2023)."

AI RiskX.1 > Excluded4 - Not coded

Record summary

A quick snapshot of what this page covers.

Techniques0Attack methods connected to this risk.
Mitigations0Defenses that may help with related attacks.
Domainn/aThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

Domainn/a
SubdomainX.1 > Excluded
Entity4 - Not coded
Intent4 - Not coded
Timing4 - Not coded
CategoryJailbreaks and Prompt Injections Threaten Security of LLMs
SubcategoryAdversarial Optimization:

Suggested mitigations

Defenses that may help with related attacks.

No propagated mitigations. No defense is available through the connected attack methods.

Source

Research source for this risk, when available.