APromptRiskDBThreat intelligence atlas
AI Security Technique

Discover AI Model Outputs - AI Security Technique

Adversaries may discover model outputs, such as class scores, whose presence is not required for the system to function and are not intended for use by the end user. Model outputs may be found in logs or may be included in API responses. Model outputs may enable the adversary to identify weaknesses in the model and develop attacks.

AI Security TechniquedemonstratedDiscovery

Record summary

A quick snapshot of what this page covers.

Tactics1Attacker goals connected to this method.
Mitigations4Defenses that may help against this attack.
AI risks0Research-backed risks connected to this topic.

Attack context

How this AI attack works in practice.

ATLAS ID
AML.T0063
Priority score
52
Maturity: demonstrated
Discovery

Mitigations

Defenses that may help against this attack.

AML.M0017 - AI Model Distribution Methods

Deployment
LifecycleDeploymentCategoryPolicy

Avoiding the deployment of models to edge devices reduces an adversary's ability to collect sensitive information about the model outputs.

AML.M0012 - Encrypt Sensitive Information

Data PreparationML Model Engineering+1 more
LifecycleData Preparation + 2 moreCategoryTechnical - Cyber

Encrypting model outputs can prevent adversaries from discovering sensitive information about the AI-enabled system or its operations.

AML.M0002 - Passive AI Output Obfuscation

DeploymentML Model Evaluation
LifecycleDeployment + 1 moreCategoryTechnical - ML

Obfuscating model outputs can prevent adversaries from collecting sensitive information about the model outputs.

Case studies

Examples from public reports and exercises.

ProofPoint Evasion

exercise
Date2019-09-09

Proof Pudding (CVE-2019-20634) is a code repository that describes how ML researchers evaded ProofPoint's email protection system by first building a copy-cat email protection ML model, and using the insights to bypass the live system. More specifically, the insights allowed researchers to craft malicious emails that received preferable scores, going undetected by the system. Each word in an email is scored numerically based on multiple variables and if the overall score of the email is too low, ProofPoint will output an error, labeling it as SPAM.

Bypassing Cylance's AI Malware Detection

exercise
Date2019-09-07

Researchers at Skylight were able to create a universal bypass string that evades detection by Cylance's AI Malware detector when appended to a malicious file.

Source

Where this page information comes from.