Fast Axiomatic Attribution for Neural Networks

1Department of Computer Science, TU Darmstadt     2hessian.AI

NeurIPS 2021

Responsive image

Keywords: deep learning, interpretability, attributions, explanations, attribution priors

TL;DR: For various ReLU DNNs without bias-terms, we show that axiomatic feature attributions from Integrated Gradients can be computed efficiently in closed form.

Abstract: Mitigating the dependence on spurious correlations present in the training dataset is a quickly emerging and important topic of deep learning. Recent approaches include priors on the feature attribution of a deep neural network (DNN) into the training process to reduce the dependence on unwanted features. However, until now one needed to trade off high-quality attributions, satisfying desirable axioms, against the time required to compute them. This in turn either led to long training times or ineffective attribution priors. In this work, we break this trade-off by considering a special class of efficiently axiomatically attributable DNNs for which an axiomatic feature attribution can be computed with only a single forward/backward pass. We formally prove that nonnegatively homogeneous DNNs, here termed 𝒳-DNNs, are efficiently axiomatically attributable and show that they can be effortlessly constructed from a wide range of regular DNNs by simply removing the bias term of each layer. Various experiments demonstrate the advantages of 𝒳-DNNs, beating state-of-the-art generic attribution methods on regular DNNs for training with attribution priors.

NeurIPS Talk

Take Home

  • For nonnegatively homogeneous DNNs (𝒳-DNNs), Input×Gradient equals Integrated Gradients
  • Removing the bias term from ReLU DNNs is one possible way to obtain 𝒳-DNNs
  • Use 𝒳-DNNs when you need efficiently computable axiomatic attributions
  • Use 𝒳-DNNs for applications that require equivariance to contrast changes

Related Work

  • G. Erion, J. Janizek, P. Sturmfels, S. Lundberg and S. Lee, "Improving Performance of Deep Learning Models with Axiomatic Attribution Priors and Expected Gradients," in Nature Machine Intelligence, 2021
  • A. Ross, M. Hughes, F. Doshi-Velez, "Right for the Right Reasons: Training Differentiable Models by Constraining their Explanations," in International Joint Conferences on Artificial Intelligence (IJCAI), 2017
  • A. Shrikumar, P. Greenside, A. Shcherbina, and A. Kundaje, "Not just a black box: Learning important features through propagating activation differences," arXiv:1605.01713 [cs.LG], 2016.
  • K. Simonyan, A. Vedaldi, and A. Zisserman, "Deep inside convolutional networks: Visualising image classification models and saliency maps," in International Conference on Learning Representations (ICLR), 2014
  • M. Sundararajan, A. Taly, and Q. Yan, "Axiomatic attribution for deep networks," in International Conference on Machine Learning (ICML), 2017

Disclosure of Funding

This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 866008). The project has also been supported in part by the State of Hesse through the cluster projects "The Third Wave of Artificial Intelligence (3AI)" and "The Adaptive Mind (TAM)".

Responsive image