Skip to content

Deep learning pipeline to obtain latent representation from images using 2D and 3D diffusion and other autoencoders

License

Notifications You must be signed in to change notification settings

GlastonburyGroup/ImLatent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Unsupervised latent representation learning using 2D and 3D diffusion and other autoencoders

This repository provides a pipeline for training and evaluating 2D and 3D diffusion autoencoders, traditional autoencoders, and various variational autoencoders for unsupervised latent representation learning from 2D and 3D images, primarily focusing on MRIs. This repository was developed as part of the paper titled Unsupervised cardiac MRI phenotyping with 3D diffusion autoencders reveals novel genetic insights and was utilised to learn and infer latent representations from cardiac MRIs (CINE) using a 3D diffusion autoencoder.

Pipeline

Structure

Inside the Executors package, you will find the actual execution scripts. For different problems, the relevant main files can be placed here. Sub-packages may also be created within this package to better organise different experiments.

  • _main.py_:* This is the main script used to run the pipeline. It contains all the default values for various command-line arguments, which can be supplied when invoking this script.

Currently, three distinct main files are available:

  1. main_recon.py: For 2D and 3D autoencoders (including VAEs but excluding diffusion autoencoders).
  2. main_diffAE.py: For 2D and 3D diffusion autoencoders.
  3. main_latentcls.py: For training a classifier on the latent space.

Inside the Configs folder, there are two types of files required by the main scripts. Sub-folder structures can be created within this folder to better organise different experiments.

  • _config.yaml_:* These files contain configuration parameters, typically specific to different aspects of the pipeline (e.g., learning rate scheduler parameters). As these parameters are less likely to change frequently, they are defined here in a hierarchical format. However, these values can also be overridden using command-line arguments, which will be discussed later.
  • _dataNpath.json_:* As the name suggests, these files include dataset-specific parameters, such as the name of the foldCSV file and the run prefix. They also define necessary paths, such as the data directory (where the data.h5 file, the supplied CSV for folds, and the processed dataset files are stored).

Inside the Engineering package, all the files related to the actual implementation of the pipeline are stored.

Executing the pipeline

A conda environment can be created using the provided environment.yml file.

To execute, the call should be from the root directly of the pipeline. For example:

    python Executors/main_recon.py --batch_size 32 --lr 0.0001 --training§LRDecay§type1§decay_rate 0.15 --training§prova testing --json§save_path /myres/toysets/Results

Here, main_recon.py is the main file that is to be executed, batch_size and lr are arguments which will be replacing the default values for those parameters supplied within that main file, training§LRDecay§type1§decay_rate and training§prova (arguments not specified inside the main file or any of the Engines) are going to override the values present inside the config.yaml file mentioned inside the main file (or supplied as a command line argument) - following the path splitting the key with dollars, and finally, json§save_path (same as the earlier one, but arguments starting with json§) will override the save_path value of the parameter inside the dataNpath.json specified inside the main file (or supplied as a command line argument).

Please note:
training§LRDecay§type1§decay_rate will try to find the dictonary path training/LRDecay/type1/decay_rate inside the yaml file. If the path is found, the value will be updated (for the current run only) with the supplied one. If it's not found, like in this example training§prova, a new path will be created, and the value will be added (for the current run only). Any command line argument that is not found inside the main file, or any of the Engines or Warp Drives, will be treated as an "unknown" parameter and will be treated in this manner - unless they start with "_json§". In that case, it is used to update the value of save_path present inside the dataNpath.json for the current run.

For complete list of command line arguments, please refer to the main files or execute the main file with the --help flag. For example:

    python Executors/main_diffAE.py --help

and check the files inside Configs folder.

Running inference on a trained model

Add the following command line arguments to the ones used during training:

    --resume --load_best --run_mode 2

To run inference of a model currently in training (i.e. interm inference), add the following:

    --resume --load_best --run_mode 2 --output_suffix XYZ

where XYZ is a suffix that would be added to the Output directory. To run inference on the whole dataset (ignoring the splits), add the following:

    --resume --load_best --run_mode 2 --output_suffix fullDS --json§foldCSV 0

(can also change fullDS to something else)

Dataset

This pipeline expects an HDF5 file containing the dataset as input, following the structure described below.

The groups should follow the path:

patientID/fieldID/instanceID

Groups may (optionally) include the following attributes: DICOM Patient ID, DICOM Study ID, description of the study, AET (model of the scanner), Host, date of the study, number of series in the study:

    patientDICOMID, studyDICOMID, studyDesc, aet, host, date, n_series

Each series present is stored as a separate dataset, and the key for the datasets can be:

  • primary: To store the primary data (i.e., the series description matches one of the values of the primary_data attribute).
  • primary_*: (Optional) If the multi_primary attribute is set to True, then instead of a single primary key, there will be multiple keys in the form primary_*, where * is replaced by the corresponding tag supplied using the primary_data_tags attribute.
  • auxiliary_*: (Optional) To store auxiliary data (e.g., T1 maps in the case of ShMoLLI). Here, * is replaced by the corresponding tag supplied using the auxiliary_data_tags attribute.

Additional type information may be appended to the dataset key:

  1. _sagittal, _coronal, or _transverse:
    If the default_plane attribute is not present, or the acquisition plane (determined using the DICOM header tag 0020|0037) of the series differs from the value specified in the default_plane attribute, the plane is appended to the key.

  2. _0 to _n:
    If the repeat_acq attribute is set to True, "_0" is appended to the first acquisition with the key created following the above rules. Subsequent acquisitions with the same key will have suffixes like _1, _2, ..., _n. If repeat_acq is set to False (or not supplied), "_0" is not appended, and any repeated occurrence of the same key is ignored (after logging an error).

The value of each dataset must be the data itself.

Each dataset may (optionally) include the following attributes: series ID, DICOM header, description of the series, the min and max intensity values of the series (of the magnitude, in case of complex-valued):

    seriesID, DICOMHeader, seriesDesc, min_val, max_val

For volumetric normalisation modes (e.g., norm_type = divbymaxvol or norm_type = minmaxvol), the min_val and max_val attributes are required.

Data Dimensions

The dataset must be 5D, with the following shape:

Channel : Time : Slice : X : Y 
  • Channel: This dimension is used to stack different MRIs from multi-echo or multi-TIeff MRI acquisitions, referred to as "Channels." In the case of multi-contrast MRIs, this dimension can also be used, but only if the images are co-registered. If there is only one channel, the shape of this dimension is 1.
  • Time: For dynamic MRIs (and other dynamic acquisitions), the different time points should be concatenated along this dimension. If there is only one time point, the shape of this dimension is 1.
  • Slice: For 3D MRIs, this dimension stores the different slices. For 2D acquisitions, the shape of this dimension will be 1.
  • X and Y: These represent the in-plane spatial dimensions.

For any other type of data, the dimensions can be reshaped to fit this structure (i.e., unnecessary dimensions can be set to have a shape of 1).

Note: In this research, the UK Biobank MRI ZIP files were processed using the script available at https://github.com/GlastonburyGroup/CardiacDiffAE_GWAS/blob/master/preprocess/createH5s/createH5_MR_DICOM.py to create the corresponding HDF5 file.

Trained Weights from Hugging Face

The 3D DiffAE models trained on the CINE Cardiac Long Axis MRIs from UK Biobank as part of the research Unsupervised cardiac MRI phenotyping with 3D diffusion autoencoders reveals novel genetic insights are available on Hugging Face.

To use the weights (without this pipeline), you can directly load them using the Hugging Face Transformers library or you can use them with this library using by supplying the --load_hf argument to the main files. For example,

    python Executors/main_diffAE.py --load_hf GlastonburyGroup/UKBBLatent_Cardiac_20208_DiffAE3D_L128_S1701

After loading, the model can further be trained (treating our weights as pretrained weights) or used for inference (following the instructions in the previous section, except the --resume --load_best flags).

An application is also been hosted on Hugging Face Spaces, where you can use your own MRIs to infer latent representations using the trained 3D DiffAE models.

Citation

If you find this work useful or utilise this pipeline (or any part of it) in your research, please consider citing us:

@article{Ometto2024.11.04.24316700,
            author       = {Ometto, Sara and Chatterjee, Soumick and Vergani, Andrea Mario and Landini, Arianna and Sharapov, Sodbo and Giacopuzzi, Edoardo and Visconti, Alessia and Bianchi, Emanuele and Santonastaso, Federica and Soda, Emanuel M and Cisternino, Francesco and Ieva, Francesca and Di Angelantonio, Emanuele and Pirastu, Nicola and Glastonbury, Craig A},
            title        = {Unsupervised cardiac MRI phenotyping with 3D diffusion autoencoders reveals novel genetic insights},
            elocation-id = {2024.11.04.24316700},
            year         = {2024},
            doi          = {10.1101/2024.11.04.24316700},
            publisher    = {Cold Spring Harbor Laboratory Press},
            url          = {https://www.medrxiv.org/content/early/2024/11/05/2024.11.04.24316700},
            journal      = {medRxiv}
          }  

Credits

This pipeline is developed by Dr Soumick Chatterjee (as part of the Glastonbury Group, Human Technopole, Milan, Italy) based on the NCC1701 pipeline from the paper ReconResNet: Regularised residual learning for MR image reconstruction of Undersampled Cartesian and Radial data. Special thanks to Dr Domenico Iuso for collaborating on enhancing the NCC1701 and this pipeline with the latest PyTorch (and related) features, including DeepSpeed, and to Rupali Khatun for her contributions to the pipeline with the complex-valued autoencoders.

DiffAE: Diffusion Autoencoder

The 2D DiffAE model in this repository is based on the paper Diffusion Autoencoders: Toward a Meaningful and Decodable Representation. The repository adapts the code from the original DiffAE repository to work with non-RGB images (e.g., MRIs) and extends it to 3D for processing volumetric images.

If you are using the DiffAE model from this repository, in addition to citing our paper mentioned above, please also cite the original paper:

@inproceedings{preechakul2021diffusion,
      title={Diffusion Autoencoders: Toward a Meaningful and Decodable Representation}, 
      author={Preechakul, Konpat and Chatthee, Nattanat and Wizadwongsa, Suttisak and Suwajanakorn, Supasorn},
      booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, 
      year={2022},
}

pythae: Unifying Generative Autoencoders in Python

For non-diffusion autoencoders (including VAEs), this pipeline utilises and extends (e.g. additional models, including complex-valued models) the pythae package. This package has been integrated into our pipeline, and to use models from this package, one must supply 0 as the modelID, the model name (from the list in the package's __init__.py file located inside Engineering/Engines/WarpDrives/pythaeDrive) as pythae_model, and the relative path (from Engineering/Engines/WarpDrives/pythaeDrive/configs) to the configuration JSON file for pythae as pythae_config. This is optional; if left blank, the default configuration will be used.

For example, for the Factor VAE's configuration file intended for the CelebA dataset, originals/celeba/factor_vae_config.json must be supplied. Default configurations can also be found in the same __init__.py file. The __init__.py file must be updated whenever a new model is added to the package or a new one is introduced.

The original configuration files (including the entire package) are intended for a few "toy" image datasets (binary_mnist, celeba, cifar10, dsprites, and mnist). Since CelebA is the most complex dataset among them, we have chosen those configurations as defaults. However, these configurations may need to be modified to suit our specific tasks.

If you are using any of the non-diffusion autoencoder (including VAEs) models from this repository, in addition to citing our paper mentioned above, please also cite the original paper:

@inproceedings{chadebec2022pythae,
        author = {Chadebec, Cl\'{e}ment and Vincent, Louis and Allassonniere, Stephanie},
        booktitle = {Advances in Neural Information Processing Systems},
        editor = {S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh},
        pages = {21575--21589},
        publisher = {Curran Associates, Inc.},
        title = {Pythae: Unifying Generative Autoencoders in Python - A Benchmarking Use Case},
        volume = {35},
        year = {2022}
}