Overview

Goal

Text evasion attacks aim to evade correct classification by a model.

Impact

Successful evasion attacks against text models can enable attackers to evade spam filters, malware detection systems, or other text-based security classifiers.

How do these attacks work?

These attacks use various word importance selection and substitution techniques to alter critical words with in the text. With some being able to produce changes that imperceptible to a human reader.

Example Threat Scenario

A financial institution allows customers to upload documents to support loan applications. The firm uses an AI-powered system to process these documents and extract relevant information. The AI system interacts with a traditional software system for processing documents that is vulnerable to SQL Injection attacks. This whole system is protected by a Web Application Firewall (WAF) designed to detect and block malicious inputs like SQL injections or Remote Code Executions (RCE). However, a sophisticated attacker discovers a method to encode and hide harmful instructions within a seemingly innocuous document. The attacker crafts a loan application document that contains hidden, encoded prompts. These encoded instructions are designed to bypass the WAF and, when processed by the AI system, transform into SQL injection commands that exploit the vulnerable document processing system. If successful, this could trick the AI into facilitating execution of unauthorized database queries, potentially exposing sensitive financial or personal data, or manipulating loan approval processes.

Remediation

Model Hardening

View Guidelines

Input Restoration

View Guidelines

Ensemble Methods

View Guidelines

Welcome

User Guide

Attack Library

Remediation Library

Goal

Impact

How do these attacks work?

Example Threat Scenario

Remediation

Model Hardening

Input Restoration

Ensemble Methods

Further Reading

Welcome

User Guide

Attack Library

Remediation Library

​Goal

​Impact

​How do these attacks work?

​Example Threat Scenario

​Remediation

Model Hardening

Input Restoration

Ensemble Methods

​Further Reading

Goal

Impact

How do these attacks work?

Example Threat Scenario

Remediation

Further Reading