Overview
Image evasion attacks aim to evade correct classification of a given image by a model.
Goal
Evade correct classification of a given image by a model.
Impact
Can enable attackers to evade content moderation filters, or trick image classification modes used in sensitive applications.
How do these attacks work?
Image Evasion attacks use various perturbation methods to alter images by changing specific pixels, or applying pre-defined or computed patterns.
Example Threat Scenario
A military contractor has developed and released an aerial drone detection system that relies on machine learning algorithms to recognize drones in aerial images. The system processes input from cameras or LiDAR to detect the shape, size, and other drone-specific features and returns an alert or a classification with a confidence score that a drone is present. However, an attacking nation state begins adding small, strategically crafted stickers or patches onto the drone that appear to the human eye as innocuous, random patterns. These patches are computed by an attacker understanding what perturbations degrade the performance of the target model to detect drones via a staged image evasion attack upon the model. An attacker with access to the underlying target model either via exfiltration of technology or software can utilise image evasion attacks to compute minimal perturbations required to fool the target model. These offline computed perturbations transfer to the real world when applied onto the objects the attacker is wishing to masking from the target model, therefore tricking the aerial drone system.