Record summary
A quick snapshot of what this page covers.
Control summary
What this defense is meant to help prevent.
When training or fine-tuning a generative AI model it is important to utilize techniques that improve model alignment with safety, security, and content policies.
The fine-tuning process can potentially remove built-in safety mechanisms in a generative AI model, but utilizing techniques such as Supervised Fine-Tuning, Reinforcement Learning from Human Feedback or AI Feedback, and Targeted Safety Context Distillation can improve the safety and alignment of the model.
- ATLAS ID
- AML.M0022
- Priority score
- 35
Covered techniques
Attacks this defense is designed to help with.
AML.T0053 - AI Agent Tool Invocation
Model alignment can improve the parametric safety of a model by guiding it away from unsafe prompts and responses.
AML.T0062 - Discover LLM Hallucinations
Model alignment can help steer the model away from hallucinated content.
AML.T0056 - Extract LLM System Prompt
Model alignment can improve the parametric safety of a model by guiding it away from unsafe prompts and responses.
AML.T0057 - LLM Data Leakage
Model alignment can improve the parametric safety of a model by guiding it away from unsafe prompts and responses.
AML.T0054 - LLM Jailbreak
Model alignment can improve the parametric safety of a model by guiding it away from unsafe prompts and responses.
AML.T0051 - LLM Prompt Injection
Model alignment can improve the parametric safety of a model by guiding it away from unsafe prompts and responses.
AML.T0061 - LLM Prompt Self-Replication
Model alignment can increase the security of models to self replicating prompt attacks.
Source
Where this page information comes from.
Original source
Original source links
Open the public records and source datasets used for this page.