Skip to main contentReduce the information outputted from a model (confidence values, input size, token limits), reducing the ability for an adversary to extract information and optimize attacks for the model. Such recommendations include confidence rounding, Gradient Masking.
Explanation
How it works
How to implement