TY - JOUR
T1 - Interpreting Global Perturbation Robustness of Image Models using Axiomatic Spectral Importance Decomposition
AU - Luo, Róisín
AU - McDermott, James
AU - O’riordan, Colm
N1 - Publisher Copyright:
© 2024, Transactions on Machine Learning Research. All rights reserved.
PY - 2024
Y1 - 2024
N2 - Perturbation robustness evaluates the vulnerabilities of models, arising from a variety of perturbations, such as data corruptions and adversarial attacks. Understanding the mechanisms of perturbation robustness is critical for global interpretability. We present a model-agnostic, global mechanistic interpretability method to interpret the perturbation robustness of image models. This research is motivated by two key aspects. First, previous global interpretability works, in tandem with robustness benchmarks, e.g. mean corruption error (mCE), are not designed to directly interpret the mechanisms of perturbation robustness within image models. Second, we notice that the spectral signal-to-noise ratios (SNR) of perturbed natural images exponentially decay over the frequency. This power-law-like decay implies that: Low-frequency signals are generally more robust than high-frequency signals – yet high classification accuracy can not be achieved by low-frequency signals alone. By applying Shapley value theory, our method axiomatically quantifies the predictive powers of robust features and non-robust features within an information theory framework. Our method, dubbed as I-ASIDE (Image Axiomatic Spectral Importance Decomposition Explanation), provides a unique insight into model robustness mechanisms. We conduct extensive experiments over a variety of vision models pre-trained on ImageNet, including both convolutional neural networks (e.g. AlexNet, VGG, GoogLeNet/Inception-v1, Inception-v3, ResNet, SqueezeNet, RegNet, MnasNet, MobileNet, EfficientNet, etc.) and vision transformers (e.g. ViT, Swin Transformer, and MaxViT), to show that I-ASIDE can not only measure the perturbation robustness but also provide interpretations of its mechanisms.
AB - Perturbation robustness evaluates the vulnerabilities of models, arising from a variety of perturbations, such as data corruptions and adversarial attacks. Understanding the mechanisms of perturbation robustness is critical for global interpretability. We present a model-agnostic, global mechanistic interpretability method to interpret the perturbation robustness of image models. This research is motivated by two key aspects. First, previous global interpretability works, in tandem with robustness benchmarks, e.g. mean corruption error (mCE), are not designed to directly interpret the mechanisms of perturbation robustness within image models. Second, we notice that the spectral signal-to-noise ratios (SNR) of perturbed natural images exponentially decay over the frequency. This power-law-like decay implies that: Low-frequency signals are generally more robust than high-frequency signals – yet high classification accuracy can not be achieved by low-frequency signals alone. By applying Shapley value theory, our method axiomatically quantifies the predictive powers of robust features and non-robust features within an information theory framework. Our method, dubbed as I-ASIDE (Image Axiomatic Spectral Importance Decomposition Explanation), provides a unique insight into model robustness mechanisms. We conduct extensive experiments over a variety of vision models pre-trained on ImageNet, including both convolutional neural networks (e.g. AlexNet, VGG, GoogLeNet/Inception-v1, Inception-v3, ResNet, SqueezeNet, RegNet, MnasNet, MobileNet, EfficientNet, etc.) and vision transformers (e.g. ViT, Swin Transformer, and MaxViT), to show that I-ASIDE can not only measure the perturbation robustness but also provide interpretations of its mechanisms.
UR - https://www.scopus.com/pages/publications/85219552901
M3 - Article
AN - SCOPUS:85219552901
SN - 2835-8856
VL - 2024
JO - Transactions on Machine Learning Research
JF - Transactions on Machine Learning Research
ER -