Skip to main content

Goal

Text evasion attacks aim to evade correct classification by a model.

Impact

Successful evasion attacks against text models can enable attackers to evade spam filters, malware detection systems, or other text-based security classifiers.

How do these attacks work?

These attacks use various word importance selection and substitution techniques to alter critical words with in the text. With some being able to produce changes that imperceptible to a human reader.

Example Threat Scenario

A financial institution allows customers to upload documents to support loan applications. The firm uses an AI-powered system to process these documents and extract relevant information. The AI system interacts with a traditional software system for processing documents that is vulnerable to SQL Injection attacks. This whole system is protected by a Web Application Firewall (WAF) designed to detect and block malicious inputs like SQL injections or Remote Code Executions (RCE). However, a sophisticated attacker discovers a method to encode and hide harmful instructions within a seemingly innocuous document. The attacker crafts a loan application document that contains hidden, encoded prompts. These encoded instructions are designed to bypass the WAF and, when processed by the AI system, transform into SQL injection commands that exploit the vulnerable document processing system. If successful, this could trick the AI into facilitating execution of unauthorized database queries, potentially exposing sensitive financial or personal data, or manipulating loan approval processes.

Remediation

Further Reading

MITRE Atlas AML.T0015
I