ISBI 2022 KNIGHT Challenge: Kidney clinical Notes and Imaging to Guide and Help personalize Treatment and biomarkers discovery
Keywords: Radiology, KiTS, CT, Renal Cancer, Accelerated Discovery
See https://github.com/BiomedSciAI/fuse-med-ml#installation
The aim of the KNIGHT challenge is to facilitate the development of Artificial Intelligence (AI) models for automatic preoperative prediction of risk class for patients with renal masses identified in clinical Computed Tomography (CT) imaging of the kidneys.
The dataset, we name the Kidney Classification (KiC) dataset, is based on the 2021 Kidney and Kidney Tumor Segmentation challenge (KiTS) and extended to include additional CT phases and clinical information, as well as risk classification labels, deducted from postoperative pathology results.
Some of the clinical information will also be available for inference. The patients are classified into five risk groups in accordance with American Urological Association (AUA) guidelines.
These groups can be divided into two classes based on the follow-up treatment.
The challenge consists of three tasks: (1) binary patient classification as per the follow-up treatment, (2) fine-grained classification into five risk groups and (3) discovery of prognostic biomarkers.
Kidney Classification (KiC) dataset. Details can be found in challenge website. You can also get the dataset by running download_data.sh
in the location where you want the data to be stored locally.
The participants should submit a .csv file per task containing a row with class scores for each patient in the test set. The rows must adhere to the following scheme:
Task 1 Prediction File: [case_id,NoAT-score,CanAT-score]
See example prediction file for task 1
Task 2 Prediction File: [case_id,B-score,LR-score,IR-score,HR-score,VHR-score]
See example prediction file for task 2
Here, "case_id" represents the sample (e.g. 00000) and all scores represent the probability of a patient to belong to a class.
The evaluation script together with a dummy prediction file can be found in fuse_examples/imagin/classification/knight/eval
More details can be found in challenge website
To run the evaluation script use the following command:
cd fuse_examples/imaging/classification/knight/eval
python eval.py <target_filename> <task1_prediction_filename> <task2_prediction_filename> <output dir>
If you only want to evaluate Task 1, you may pass an empty string in place of <task2_prediction_filename>
, and vice-versa.
As an example, this command will evaluate the dummy example predictions and targets:
cd fuse_examples/imaging/classification/knight/eval
python eval.py example/example_targets.csv example/example_task1_predictions.csv example/example_task2_predictions.csv example/results
This example demonstrates a basic classification pipeline for the KNIGHT challenge Kidney Classification (KiC) dataset. We utilize the FuseMedML library for all stages in the pipeline including data preprocessing, feature extraction, network training and metrics calculation. In this example, we extract features from the CT volumes using a 3D ResNet-18 backbone, concatenate them with available clinical features processed by a smaller fully connected network, and classify the combined features into two risk groups as defined in the 1st task of the KNIGHT challenge.
-
As a preliminary step, make sure you install FuseMedML according to the installation instructions. In order to understand the basics of the library in high level, please read as a minimum this user guide. You can also access more documentation including code examples and tutorials from the FuseMedML Github page.
-
Download the KNIGHT database from the official KNIGHT database repository. The KNIGHT database is strongly based on the KiTS21 database. In KiTS21 the task was semantic segmentation, and there are segmentation annotations for it. You might want to benefit from them in your approach to the KNIGHT challenge and you may access them by cloning the official KiTS21 database repository. But you won't need them for this example.
-
cd
intoKNIGHT
and runpython knight/scripts/get_imaging.py
to download the imaging. -
Set the environment variable
KNIGHT_DATA
to your local path to the cloned KNIGHT database repository (which should now contain the imaging and clinical data underknight/data
). Additionally, set the environment variableKNIGHT_CACHE
to the path where you want to store cached data. Lastly, set theKNIGHT_RESULTS
environment variable to the location where you want to store your trained models. -
Now you can run the
baseline/fuse_baseline.py
script which trains and evaluates our baseline model. In the top of the script, are some parameters that you can change. Theuse_data
dictionary sets whether to use only clinical data, only imaging, or both. We encourage you to develop a solution which utilizes and benefits from both. The "clinical only" mode trains much faster and requires significantly less resources, as expected. Theresize_to
parameter sets a common size to which we resize the input volumes. Important Note: Once you runfuse_baseline.py
for the first time, it will perform caching of the data and store the cached data in yourKNIGHT_CACHE
folder. The next time you runfuse_baseline.py
, it will skip this process, and use the existing, already cached data. You need to be aware of this, because if you want to modify anything related to the data (for example, theresize_to
parameter), then you will need to manually delete the contents ofKNIGHT_CACHE
folder, to allow the caching process to take place again with the new parameters. -
While running the
fuse_baseline.py
script, you can monitor your model training using an instance of Tensor Board, for which thelogdir
param is set to yourKNIGHT_RESULTS
directory.
The test data (without labels) will be sent to challenge participants soon.
In order to run a trained baseline model on it, run make_targets_file.py
with the model_dir
argument set to your trained model directory, data_path
set to the test data path, and split
set to None
.
In this example we perform some basic preprocessing on the imaging data. We downsize the images by a factor of 2 in the X and Y dimensions (to 256), which is only meant for efficiency and not necessarily recommended. Additionally we resize in the Z dimension to 110, the median number of slices. We clip the voxel values to [-62, 310] (corresponds to 0.5 and 99.5 percentile in the foreground regions). Then, we subtract 104.9 and divide by 75.3 (corresponds to mean and std in the foreground regions, respectively). We also normalize the clinical features according to their approximate expected maximum value.
We used two 32GB V-100 GPUs for running this baseline and reporting the results below.
However, we did manage to train an "Imaging + Clinical" model with comparable results on a single 12GB GPU by reducing the batch size to 2 and changing the resize_to
parameter to (256, 256, 64)
.
Running the code with the parameters currently set in fuse_baseline.py
gave the following best epoch validation AUC results:
Clinical data only | Imaging only | Imaging + Clinical |
---|---|---|
0.87 | 0.71 | 0.85 |
Clinical data only | Imaging only | Imaging + Clinical |
---|---|---|
0.82 | 0.68 | 0.80 |
The per-class AUC for the Imaging + Clinical model is:
Benign | Low Risk | Intermediate Risk | High Risk | Very High Risk |
---|---|---|---|---|
0.87 | 0.85 | 0.66 | 0.91 | 0.72 |
Our goal in this example is to provide easy to follow code that can help you get started with the KNIGHT challenge using FuseMedML. We encourage you to experiment with, improve, or change completely any step in the proposed pipeline. We also encourage you to learn about solutions to the KiTS19 and KiTS21 challenges, to gain useful insight into working with this imaging data, towards a related, but different task.
Here are some of the things we knowingly avoided for the sake of simplicity:
-
We use most but not all of the available clinical features, and we simplify some of those that we do use. For example, we use a simplified binary version of the "comorbidities" feature which only cares whether or not a patient has any comorbidity. In practice the data contains more specific information, which you may want to use.
-
We didn't make any use of the segmentation labels that are given in the KiTS database. You can use them in any way to improve your preprocessing, data sampling or model training. Note that the segmentations won't be available for the test cases.
-
We didn't resample the images with respect to their spacing, but only resized to a common voxel size. Addressing the trade-off between input patch size (limited by the GPU memory) and the amount of contextual information that it contains (controlled by a possible resampling procedure) can be important. You may want to resample the volumes to a common spacing, and you may want (or be forced to, due to GPU memory constraints), to train on smaller cropped patches, with some logic which "prefers" foreground patches.
'fuse_examples/imaging/classification/knight/make_targets_file.py' is a script that makes a targets file for the evaluation script.
Targets file is a csv file that holds just the labels for both tasks. This files is one of the inputs of the evaluation script.
The script extracts the labels from the PyTorch dataset included in baseline implementation.
The baseline implementation is using specific train/validation split, You can either use the same train/validation split or set a different split.
The script including additional details and documentation can be found in: 'fuse_examples/imaging/classification/knight/make_targets_file.py'
'fuse_examples/imaging/classification/knight/make_predictions_file.py' is a script that automatically makes predictions files for any model trained using FuseMedML.
Predictions file is a csv file that include prediction score per class and should adhere a format specified in evaluation section.
The script including additional details and documentation can be found in: 'fuse_examples/imaging/classification/knight/make_predictions_file.py'