Foundationality May Cause Correlated Failures

Record summary

A quick snapshot of what this page covers.

Techniques0Attack methods connected to this risk.

Mitigations0Defenses that may help with related attacks.

Domain7. AI System Safety, Failures, & LimitationsThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

"Another important characteristic of LLM development is foundationality — due to the expense of large- scale pretraining, many deployed instances share similar or identical learned components. Foundation- ality may both be a blessing and a curse. On the one hand, it may be possible to exploit the similarity in the design of LLM-agents to facilitate cooperation (Critch et al., 2022; Conitzer and Oesterheld, 2023; Oesterheld et al., 2023). On the other hand, foundationality may leave LLM-agents vulnerable to correlated failures both in terms of safety and capabilities due to increased output homogenization (Bommasani et al., 2022)."

Domain7. AI System Safety, Failures, & Limitations

Subdomain7.6 > Multi-agent risks

Entity3 - Other

Intent3 - Other

Timing2 - Post-deployment

CategoryMulti-Agent Safety Is Not Assured by Single-Agent Safety

SubcategoryFoundationality May Cause Correlated Failures

Related techniques

Attack methods connected to this risk.

No linked attack methods. No AI attack method is connected to this risk in the current data.

Suggested mitigations

Defenses that may help with related attacks.

No propagated mitigations. No defense is available through the connected attack methods.

Source

Research source for this risk, when available.

Included resource

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

AuthorsAnwar et al.Year2024TypePreprint

DOI10.48550/arXiv.2404.09932 URLhttps://arxiv.org/abs/2404.09932

Original source

MIT AI Risk Repository

Open the public repository used for AI risk records and taxonomy fields.

Repositoryhttps://airisk.mit.edu/