archive

All Records

AI security records indexed from public vulnerability, risk, and attack datasets.

Showing 621-640 of 3623 records

Limitations of Reward Modeling

Limitations of Reward Modeling is an AI risk in 7. AI System Safety, Failures, & Limitations focused on 7.1 > AI pursuing its own goals in conflict with huma...

Limitations of Human Feedback

Limitations of Human Feedback is an AI risk in 7. AI System Safety, Failures, & Limitations focused on 7.0 > AI system safety, failures, & limitations. It is...

Reward Tampering

Reward Tampering is an AI risk in 7. AI System Safety, Failures, & Limitations focused on 7.1 > AI pursuing its own goals in conflict with human goals or val...

Goal Misgeneralization

Goal Misgeneralization is an AI risk in 7. AI System Safety, Failures, & Limitations focused on 7.1 > AI pursuing its own goals in conflict with human goals...

Reward Hacking

Reward Hacking is an AI risk in 7. AI System Safety, Failures, & Limitations focused on 7.1 > AI pursuing its own goals in conflict with human goals or value...

Causes of Misalignment

Causes of Misalignment is an AI risk in 7. AI System Safety, Failures, & Limitations focused on 7.1 > AI pursuing its own goals in conflict with human goals...

Adult content

Adult content is an AI risk in 1. Discrimination & Toxicity focused on 1.2 > Exposure to toxic content. It is most relevant during 3 - Other.

Information on harmful, immoral, or illegal activity

Information on harmful, immoral, or illegal activity is an AI risk in 1. Discrimination & Toxicity focused on 1.2 > Exposure to toxic content. It is most rel...

Misinformation

Misinformation is an AI risk in 3. Misinformation focused on 3.1 > False or misleading information. It is most relevant during 3 - Other.

Alignment risks

Alignment risks is an AI risk in 7. AI System Safety, Failures, & Limitations focused on 7.1 > AI pursuing its own goals in conflict with human goals or valu...

AI Development

AI Development is an AI risk in 7. AI System Safety, Failures, & Limitations focused on 7.2 > AI possessing dangerous capabilities. It is most relevant durin...

Long-horizon Planning

Long-horizon Planning is an AI risk in 7. AI System Safety, Failures, & Limitations focused on 7.2 > AI possessing dangerous capabilities. It is most relevan...

Political Strategy

Political Strategy is an AI risk in 4. Malicious Actors & Misuse focused on 4.1 > Disinformation, surveillance, and influence at scale. It is most relevant d...

Deception

Deception is an AI risk in 7. AI System Safety, Failures, & Limitations focused on 7.2 > AI possessing dangerous capabilities. It is most relevant during 3 -...

Persuasion and manipulation

Persuasion and manipulation is an AI risk in 4. Malicious Actors & Misuse focused on 4.1 > Disinformation, surveillance, and influence at scale. It is most r...

Extreme Risks

Extreme Risks is an AI risk in 7. AI System Safety, Failures, & Limitations focused on 7.0 > AI system safety, failures, & limitations. It is most relevant d...

Robustness

Robustness is an AI risk in 7. AI System Safety, Failures, & Limitations focused on 7.3 > Lack of capability or robustness. It is most relevant during 3 - Ot...

Machine ethics

Machine ethics is an AI risk in 7. AI System Safety, Failures, & Limitations focused on 7.3 > Lack of capability or robustness. It is most relevant during 3...

Bias

Bias is an AI risk in 1. Discrimination & Toxicity focused on 1.1 > Unfair discrimination and misrepresentation. It is most relevant during 3 - Other.

Toxicity generation

Toxicity generation is an AI risk in 1. Discrimination & Toxicity focused on 1.2 > Exposure to toxic content. It is most relevant during 3 - Other.