Attack Goal

KnockOffNets is a model stealing attack that uses training as its recreation technique to create a stolen model.

Example

Consider a machine learning model that predicts an illness based on a person’s medical history. An attacker without direct access to a person’s medical records could: - Query the model with a set of synthetic inputs - Analyse the model’s outputs (various illness risk scores) - Use these outputs to train an “stolen model” which captures the behaviour of the target model. - Employ the inversion model to infer other details about a person’s medical history

Remediation