Goal

Evade measures that detect potentially malicious instructions.

Impact

Potentially exposes confidential or personally identifiable data used to train the model.

Can aid in other attacks such as jailbreaks, sql injection, xss.

How do these attacks work?

Example Threat Scenario

A financial institution allows customers to upload documents to support loan applications. The firm uses an AI-powered system to process these documents and extract relevant information. The AI system interacts with a traditional software system for processing documents that is vulnerable to SQL Injection attacks. This whole system is protected by a Web Application Firewall (WAF) designed to detect and block malicious inputs like SQL injections or Remote Code Executions (RCE). However, a sophisticated attacker discovers a method to encode and hide harmful instructions within a seemingly innocuous document. The attacker crafts a loan application document that contains hidden, encoded prompts. These encoded instructions are designed to bypass the WAF and, when processed by the AI system, transform into SQL injection commands that exploit the vulnerable document processing system. If successful, this could trick the AI into facilitating execution of unauthorized database queries, potentially exposing sensitive financial or personal data, or manipulating loan approval processes.

Remediation

Further Reading

OWASP ML Top 10: Model Inversion