Transparency - Explainability

Record summary

A quick snapshot of what this page covers.

Techniques0Attack methods connected to this risk.

Mitigations0Defenses that may help with related attacks.

Domain7. AI System Safety, Failures, & LimitationsThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

Being a multifaceted concept, the term 'transparency' is both used to refer to technical explainability as well as organizational openness. Regarding the former, papers underscore the need for mechanistic interpretability and for explaining internal mechanisms in generative models. On the organizational front, transparency relates to practices such as informing users about capabilities and shortcomings of models, as well as adhering to documentation and reporting requirements for data collection processes or risk evaluations.

Domain7. AI System Safety, Failures, & Limitations

Subdomain7.4 > Lack of transparency or interpretability

Entity4 - Not coded

Intent4 - Not coded

Timing4 - Not coded

CategoryTransparency - Explainability

Subcategoryn/a

Related techniques

Attack methods connected to this risk.

No linked attack methods. No AI attack method is connected to this risk in the current data.

Suggested mitigations

Defenses that may help with related attacks.

No propagated mitigations. No defense is available through the connected attack methods.

Source

Research source for this risk, when available.

Included resource

Mapping the Ethics of Generative AI: A Comprehensive Scoping Review

AuthorsHagendorffYear2024TypePreprint

DOI10.48550/arXiv.2402.08323 URLhttps://arxiv.org/abs/2402.08323

Original source

MIT AI Risk Repository

Open the public repository used for AI risk records and taxonomy fields.

Repositoryhttps://airisk.mit.edu/