This repository provides a pipeline for training and evaluating 2D and 3D diffusion autoencoders, traditional autoencoders, and various variational autoencoders for unsupervised latent representation learning from 2D and 3D images, primarily focusing on MRIs. This repository was developed as part of the paper titled Unsupervised cardiac MRI phenotyping with 3D diffusion autoencders reveals novel genetic insights and was utilised to learn and infer latent representations from cardiac MRIs (CINE) using a 3D diffusion autoencoder.
Inside the Executors package, you will find the actual execution scripts. For different problems, the relevant main files can be placed here. Sub-packages may also be created within this package to better organise different experiments.
- _main.py_:* This is the main script used to run the pipeline. It contains all the default values for various command-line arguments, which can be supplied when invoking this script.
Currently, three distinct main files are available:
main_recon.py
: For 2D and 3D autoencoders (including VAEs but excluding diffusion autoencoders).main_diffAE.py
: For 2D and 3D diffusion autoencoders.main_latentcls.py
: For training a classifier on the latent space.
Inside the Configs folder, there are two types of files required by the main scripts. Sub-folder structures can be created within this folder to better organise different experiments.
- _config.yaml_:* These files contain configuration parameters, typically specific to different aspects of the pipeline (e.g., learning rate scheduler parameters). As these parameters are less likely to change frequently, they are defined here in a hierarchical format. However, these values can also be overridden using command-line arguments, which will be discussed later.
- _dataNpath.json_:* As the name suggests, these files include dataset-specific parameters, such as the name of the foldCSV file and the run prefix. They also define necessary paths, such as the data directory (where the
data.h5
file, the supplied CSV for folds, and the processed dataset files are stored).
Inside the Engineering package, all the files related to the actual implementation of the pipeline are stored.
A conda environment can be created using the provided environment.yml
file.
To execute, the call should be from the root directly of the pipeline. For example:
python Executors/main_recon.py --batch_size 32 --lr 0.0001 --training§LRDecay§type1§decay_rate 0.15 --training§prova testing --json§save_path /myres/toysets/Results
Here, main_recon.py is the main file that is to be executed, batch_size and lr are arguments which will be replacing the default values for those parameters supplied within that main file, training§LRDecay§type1§decay_rate and training§prova (arguments not specified inside the main file or any of the Engines) are going to override the values present inside the config.yaml file mentioned inside the main file (or supplied as a command line argument) - following the path splitting the key with dollars, and finally, json§save_path (same as the earlier one, but arguments starting with json§) will override the save_path value of the parameter inside the dataNpath.json specified inside the main file (or supplied as a command line argument).
Please note:
training§LRDecay§type1§decay_rate will try to find the dictonary path training/LRDecay/type1/decay_rate inside the yaml file. If the path is found, the value will be updated (for the current run only) with the supplied one. If it's not found, like in this example training§prova, a new path will be created, and the value will be added (for the current run only). Any command line argument that is not found inside the main file, or any of the Engines or Warp Drives, will be treated as an "unknown" parameter and will be treated in this manner - unless they start with "_json§". In that case, it is used to update the value of save_path present inside the dataNpath.json for the current run.
For complete list of command line arguments, please refer to the main files or execute the main file with the --help
flag. For example:
python Executors/main_diffAE.py --help
and check the files inside Configs folder.
Add the following command line arguments to the ones used during training:
--resume --load_best --run_mode 2
To run inference of a model currently in training (i.e. interm inference), add the following:
--resume --load_best --run_mode 2 --output_suffix XYZ
where XYZ is a suffix that would be added to the Output directory. To run inference on the whole dataset (ignoring the splits), add the following:
--resume --load_best --run_mode 2 --output_suffix fullDS --json§foldCSV 0
(can also change fullDS to something else)
This pipeline expects an HDF5 file containing the dataset as input, following the structure described below.
The groups should follow the path:
patientID/fieldID/instanceID
Groups may (optionally) include the following attributes: DICOM Patient ID, DICOM Study ID, description of the study, AET (model of the scanner), Host, date of the study, number of series in the study:
patientDICOMID, studyDICOMID, studyDesc, aet, host, date, n_series
Each series present is stored as a separate dataset, and the key for the datasets can be:
primary
: To store the primary data (i.e., the series description matches one of the values of theprimary_data
attribute).primary_*
: (Optional) If themulti_primary
attribute is set toTrue
, then instead of a singleprimary
key, there will be multiple keys in the formprimary_*
, where*
is replaced by the corresponding tag supplied using theprimary_data_tags
attribute.auxiliary_*
: (Optional) To store auxiliary data (e.g., T1 maps in the case of ShMoLLI). Here,*
is replaced by the corresponding tag supplied using theauxiliary_data_tags
attribute.
Additional type information may be appended to the dataset key:
-
_sagittal
,_coronal
, or_transverse
:
If thedefault_plane
attribute is not present, or the acquisition plane (determined using the DICOM header tag0020|0037
) of the series differs from the value specified in thedefault_plane
attribute, the plane is appended to the key. -
_0
to_n
:
If therepeat_acq
attribute is set toTrue
, "_0" is appended to the first acquisition with the key created following the above rules. Subsequent acquisitions with the same key will have suffixes like_1
,_2
, ...,_n
. Ifrepeat_acq
is set toFalse
(or not supplied), "_0" is not appended, and any repeated occurrence of the same key is ignored (after logging an error).
The value of each dataset must be the data itself.
Each dataset may (optionally) include the following attributes: series ID, DICOM header, description of the series, the min and max intensity values of the series (of the magnitude, in case of complex-valued):
seriesID, DICOMHeader, seriesDesc, min_val, max_val
For volumetric normalisation modes (e.g., norm_type = divbymaxvol
or norm_type = minmaxvol
), the min_val
and max_val
attributes are required.
The dataset must be 5D, with the following shape:
Channel : Time : Slice : X : Y
Channel
: This dimension is used to stack different MRIs from multi-echo or multi-TIeff MRI acquisitions, referred to as "Channels." In the case of multi-contrast MRIs, this dimension can also be used, but only if the images are co-registered. If there is only one channel, the shape of this dimension is1
.Time
: For dynamic MRIs (and other dynamic acquisitions), the different time points should be concatenated along this dimension. If there is only one time point, the shape of this dimension is1
.Slice
: For 3D MRIs, this dimension stores the different slices. For 2D acquisitions, the shape of this dimension will be1
.X
andY
: These represent the in-plane spatial dimensions.
For any other type of data, the dimensions can be reshaped to fit this structure (i.e., unnecessary dimensions can be set to have a shape of 1
).
Note: In this research, the UK Biobank MRI ZIP files were processed using the script available at https://github.com/GlastonburyGroup/CardiacDiffAE_GWAS/blob/master/preprocess/createH5s/createH5_MR_DICOM.py to create the corresponding HDF5 file.
The 3D DiffAE models trained on the CINE Cardiac Long Axis MRIs from UK Biobank as part of the research Unsupervised cardiac MRI phenotyping with 3D diffusion autoencoders reveals novel genetic insights are available on Hugging Face.
To use the weights (without this pipeline), you can directly load them using the Hugging Face Transformers library or you can use them with this library using by supplying the --load_hf
argument to the main files. For example,
python Executors/main_diffAE.py --load_hf GlastonburyGroup/UKBBLatent_Cardiac_20208_DiffAE3D_L128_S1701
After loading, the model can further be trained (treating our weights as pretrained weights) or used for inference (following the instructions in the previous section, except the --resume --load_best
flags).
An application is also been hosted on Hugging Face Spaces, where you can use your own MRIs to infer latent representations using the trained 3D DiffAE models.
If you find this work useful or utilise this pipeline (or any part of it) in your research, please consider citing us:
@article{Ometto2024.11.04.24316700,
author = {Ometto, Sara and Chatterjee, Soumick and Vergani, Andrea Mario and Landini, Arianna and Sharapov, Sodbo and Giacopuzzi, Edoardo and Visconti, Alessia and Bianchi, Emanuele and Santonastaso, Federica and Soda, Emanuel M and Cisternino, Francesco and Ieva, Francesca and Di Angelantonio, Emanuele and Pirastu, Nicola and Glastonbury, Craig A},
title = {Unsupervised cardiac MRI phenotyping with 3D diffusion autoencoders reveals novel genetic insights},
elocation-id = {2024.11.04.24316700},
year = {2024},
doi = {10.1101/2024.11.04.24316700},
publisher = {Cold Spring Harbor Laboratory Press},
url = {https://www.medrxiv.org/content/early/2024/11/05/2024.11.04.24316700},
journal = {medRxiv}
}
This pipeline is developed by Dr Soumick Chatterjee (as part of the Glastonbury Group, Human Technopole, Milan, Italy) based on the NCC1701 pipeline from the paper ReconResNet: Regularised residual learning for MR image reconstruction of Undersampled Cartesian and Radial data. Special thanks to Dr Domenico Iuso for collaborating on enhancing the NCC1701 and this pipeline with the latest PyTorch (and related) features, including DeepSpeed, and to Rupali Khatun for her contributions to the pipeline with the complex-valued autoencoders.
The 2D DiffAE model in this repository is based on the paper Diffusion Autoencoders: Toward a Meaningful and Decodable Representation. The repository adapts the code from the original DiffAE repository to work with non-RGB images (e.g., MRIs) and extends it to 3D for processing volumetric images.
If you are using the DiffAE model from this repository, in addition to citing our paper mentioned above, please also cite the original paper:
@inproceedings{preechakul2021diffusion,
title={Diffusion Autoencoders: Toward a Meaningful and Decodable Representation},
author={Preechakul, Konpat and Chatthee, Nattanat and Wizadwongsa, Suttisak and Suwajanakorn, Supasorn},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2022},
}
For non-diffusion autoencoders (including VAEs), this pipeline utilises and extends (e.g. additional models, including complex-valued models) the pythae package. This package has been integrated into our pipeline, and to use models from this package, one must supply 0
as the modelID
, the model name (from the list in the package's __init__.py
file located inside Engineering/Engines/WarpDrives/pythaeDrive
) as pythae_model
, and the relative path (from Engineering/Engines/WarpDrives/pythaeDrive/configs
) to the configuration JSON file for pythae as pythae_config
. This is optional; if left blank, the default configuration will be used.
For example, for the Factor VAE's configuration file intended for the CelebA dataset, originals/celeba/factor_vae_config.json
must be supplied. Default configurations can also be found in the same __init__.py
file. The __init__.py
file must be updated whenever a new model is added to the package or a new one is introduced.
The original configuration files (including the entire package) are intended for a few "toy" image datasets (binary_mnist, celeba, cifar10, dsprites, and mnist). Since CelebA is the most complex dataset among them, we have chosen those configurations as defaults. However, these configurations may need to be modified to suit our specific tasks.
If you are using any of the non-diffusion autoencoder (including VAEs) models from this repository, in addition to citing our paper mentioned above, please also cite the original paper:
@inproceedings{chadebec2022pythae,
author = {Chadebec, Cl\'{e}ment and Vincent, Louis and Allassonniere, Stephanie},
booktitle = {Advances in Neural Information Processing Systems},
editor = {S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh},
pages = {21575--21589},
publisher = {Curran Associates, Inc.},
title = {Pythae: Unifying Generative Autoencoders in Python - A Benchmarking Use Case},
volume = {35},
year = {2022}
}