Assessing Saliency Maps

In this study, we comprehensively evaluate popular saliency map methods for medical imaging classification models trained on the SIIM-ACR Pneumothorax Segmentation and RSNA Pneumonia Detection datasets in terms of 4 key criteria for trustworthiness:

Utility
Sensitivity to weight randomization
Repeatability
Reproducibility

The combination of these trustworthiness criteria provide a blueprint for us to objectively assess a saliency map's localization capabilities (localization utility), sensitivity to trained model weights (versus randomized weights), and robustness with respect to models trained with the same architectures (repeatability) and different architectures (reproducibility). These criteria are important in order for a clinician to trust the saliency map output for its ability to localize the finding of interest.

For model interpretation, we evaluate the following saliency maps for their trustworthiness: Gradient Explanation (GRAD), Smoothgrad (SG), Integrated Gradients (IG), Smooth IG (SIG), GradCAM, XRAI, Guided-backprop (GBP), and Guided GradCAM (GGCAM).

Experiments

Utility

We evaluate the localization utility of each saliency method by quantifying their intersection with ground truth pixel-level segmentations available from the SIIM-ACR Pneumothorax dataset and ground truth bounding boxes available from the RSNA Pneumonia dataset respectively. To capture the intersection between the saliency maps and segmentations or bounding boxes, we consider the pixels inside the segmentations to be positive labels and those outside to be negative. Each pixel of the saliency map is thus treated as an output from a binary classifier. Hence, all the pixels in the saliency map can be jointly used to compute the area under the precision-recall curve (AUPRC) utility score.

Sensitivity to Trained vs Random Model Weights

To investigate the sensitivity of saliency methods under changes to model parameters and identify potential correlation of particular layers to changes in the maps, we employ cascading randomization. In cascading randomization, we successively randomize the weights of the trained model beginning from the top layer to the bottom one, which results in erasing the learned weights in a gradual fashion. We use the Structural SIMilarity (SSIM) index of the original saliency map with the saliency maps generated from the model after each randomization step to assess the change of the corresponding saliency maps

Repeatability and Reproducibility

We conduct repeatability tests on the saliency methods by comparing maps from a) different randomly initialized instances of models with the same architecture trained to convergence (intra-architecture repeatability) and b) models with different architectures each trained to convergence (inter-architecture reproducibility) using SSIM between saliency maps produced from each model. These experiments are designed to test if the saliency methods produce similar maps with a different set of trained weights and whether they are architecture agnostic (assuming that models with different trained weights or architectures have similar classification performance).

More details on the experiments can be found in the manuscript

Models

The models used for all experiments are available here. They include 3 replicates of the InceptionV3 network trained on the RSNA Pneumonia dataset and 3 replicates trained on the SIIM-ACR Pneumothorax datasets. The splits used for the training are highlighted here and here respectively.

For the cascading randomization and repeatability/reproducibility tests, saliency map performance was evaluated on a randomly chosen sample of 100 images from the respective test sets. These images are included in both PNG and NPY form here and here

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
figures		figures
pneumonia_samples		pneumonia_samples
pneumothorax_samples		pneumothorax_samples
scripts		scripts
.DS_Store		.DS_Store
README.md		README.md
pneumonia_splits.csv		pneumonia_splits.csv
pneumothorax_splits.csv		pneumothorax_splits.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Assessing Saliency Maps

Experiments

Utility

Sensitivity to Trained vs Random Model Weights

Repeatability and Reproducibility

Models

About

Uh oh!

Releases

Packages

Languages

QTIM-Lab/Assessing-Saliency-Maps

Folders and files

Latest commit

History

Repository files navigation

Assessing Saliency Maps

Experiments

Utility

Sensitivity to Trained vs Random Model Weights

Repeatability and Reproducibility

Models

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages