Adversarial Concept Erasure in Kernel Space

Abstract

We propose a kernalization of the linear concept-removal objective, and show that it is effective in guarding against the ability of certain nonlinear adversaries to recover the concept. Interestingly, our findings suggest that the division between linear and nonlinear models is overly simplistic: when considering the concept of binary gender and its neutralization, we do not find a single kernel space that exclusively contains all the concept-related information.

Publication
Adversarial Concept Erasure in Kernel Space