Model weight leak

Record summary

A quick snapshot of what this page covers.

Techniques9Attack methods connected to this risk.

Mitigations14Defenses that may help with related attacks.

Domain2. Privacy & SecurityThe broad risk area this belongs to.

Risk profile

How this risk is described and categorized.

"Model weights or access to them can be leaked when initial access is granted only to a select group of individuals, such as institutional researchers [209]. This risk can increase as more people gain access, and identifying the source of the leak becomes more difficult. The availability of leaked model weights makes various attacks on systems that use the leaked AI model easier to implement, such as finding adversarial examples, elicitation of dangerous capabilities, and extraction of confidential information present in the training data. The avail- ability of model weights might also enable the misuse of the AI system using the leaked model to produce harmful or illegal content [67]."

Domain2. Privacy & Security

Subdomain2.2 > AI system security vulnerabilities and attacks

Entity1 - Human

Intent1 - Intentional

Timing2 - Post-deployment

CategoryCybersecurity

SubcategoryModel weight leak

Related techniques

Attack methods connected to this risk.

Suggested mitigations

Defenses that may help with related attacks.

Source

Research source for this risk, when available.

Included resource

Risk Sources and Risk Management Measures in Support of Standards for General-Purpose AI Systems

AuthorsGipiškis et al.Year2024TypeJournal Article

DOIhttps://doi.org/10.48550/arXiv.2410.23472 URLhttps://arxiv.org/abs/2410.23472

Original source

MIT AI Risk Repository

Open the public repository used for AI risk records and taxonomy fields.

Repositoryhttps://airisk.mit.edu/

Record summary

Risk profile

Suggested mitigations

Control Access to AI Models and Data at Rest

Encrypt Sensitive Information

Control Access to AI Models and Data in Production

Generative AI Guardrails

Generative AI Guidelines

Generative AI Model Alignment

AI Telemetry Logging

Input and Output Validation for AI Agent Components

Model Hardening

Use Ensemble Methods

Use Multi-Modal Sensors

Input Restoration

Adversarial Input Detection

Deepfake Detection

Source

Risk Sources and Risk Management Measures in Support of Standards for General-Purpose AI Systems

MIT AI Risk Repository