Classifier | Classifier Input (x0) | Counterfactual (x-λ) |
Counterfactual Video | Counterfactual Alignment Plot |
---|---|---|---|---|
smiling |
![]() |
![]() |
![]() |
|
smiling (biased by arched_eyebrows) ![]() |
![]() |
![]() |
![]() |
Examples showing the CF alignment of face classifiers with one that one is biased by arched_eyebrows. This is illistrated visually as well as in the alignment plots where arched_eyebrows is observed aligned with smiling.
Models driven by spurious correlations often yield poor generalization performance. We propose the counterfactual alignment method to detect and explore spurious correlations of black box classifiers. Counterfactual images generated with respect to one classifier can be input into other classifiers to see if they also induce changes in the outputs of these classifiers. The relationship between these responses can be quantified and used to identify specific instances where a spurious correlation exists as well as compute aggregate statistics over a dataset. Our work demonstrates the ability to detect spurious correlations in face attribute classifiers. This is validated by observing intuitive trends in a face attribute classifier as well as fabricating spurious correlations and detecting their presence, both visually and quantitatively. Further, utilizing the CF alignment method, we demonstrate that we can rectify spurious correlations identified in classifiers.
Overview of the alignment CF alignment methodology. An image is encoded, reconstructed, and then processed by a classifier. The counterfactual is generated by subtracting the gradient of the classifier output w.r.t. the latent representation. The resulting representation is reconstructed back into an image. The reconstructed images are processed with multiple classifiers and the classifier outputs can be plotted side by side to study their alignment. The base model value can be used as the x-axis to more easily compare it to the predictions of another classifier. The output changes can be quantified and compared using relative change.
Relationships between face attribute classifiers as measured by CF-induced relative changes (left), classifier predictions (middle) and training data labels (right). In (a), base classifiers are along the rows and downstream classifiers are along the columns. Comparing (a) to (b) and (c), shows that many relationships reflected in the CF outputs are preserved from correlations in the training data. The non-symmetric relationship between male and wearing lipstick, highlighted in green, reflects expected interactions between the two face attribute classifiers. The relationship between male and big nose, highlighted in red, is strong in both the classifier predictions and ground truth labels but low in CF alignment, indicating that although correlated, these features aren’t exploited by the classifier.
Classifier | Classifier Input (x0) | Counterfactual (x-λ) |
Counterfactual Video | Counterfactual Alignment Plot |
---|---|---|---|---|
pointy_nose |
![]() |
![]() |
![]() |
|
attractive |
![]() |
![]() |
![]() |
|
big_lips |
![]() |
![]() |
![]() |
Examples showing the CF alignment of face classifiers. The relative change with respect to the base classifier is shown next to the name of the classifier in the legend.
We can use CF alignment to fix model bias in classifiers by using the relative change between model predictions as a loss function that can be minimized.
Here are counterfactual alignment matrices showing the relative change before and after optimization to remove the bias for big_nose on a hold out test set. The figure shows a dramatic reduction in relative change for most classifiers and big_nose with limited residual impact on other relationships. |
![]() |
Classifier | Classifier Input (x0) | Counterfactual (x-λ) |
Counterfactual Video | Counterfactual Alignment Plot |
---|---|---|---|---|
heavy_makeup |
![]() |
![]() |
![]() |
|
heavy_makeup-0.02big_lips ![]() |
![]() |
![]() |
![]() |
An example of a big lips spurious correlation being corrected for the heavy makeup classifier. The lip size remains the same as the makeup is removed.
@Article{cohen2024cfalignment,
author = {Cohen, Joseph Paul and Blankemeier, Louis and Chaudhari, Akshay},
title = {Identifying Spurious Correlations using Counterfactual Alignment},
year = {2024},
url = {}
}
We would like to thank Stanford University and the Stanford Research Computing Center for providing computational resources and support that contributed to these research results.
Cohen, J. P., Brooks, R., En, S., Zucker, E., Pareek, A., Lungren, M. P., & Chaudhari, A. (2021). Gifsplanation via Latent Shift: A Simple Autoencoder Approach to Counterfactual Generation for Chest X-rays. Medical Imaging with Deep Learning. https://openreview.net/forum?id=rnunjvgxAMt