Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactoring the image writer modules #3595

Closed
wyli opened this issue Jan 6, 2022 · 9 comments · Fixed by #3773
Closed

refactoring the image writer modules #3595

wyli opened this issue Jan 6, 2022 · 9 comments · Fixed by #3773
Labels
Feature request Module: data datasets, readers/writers, synthetic data Module: transform data transforms for preprocessing and postprocessing.

Comments

@wyli
Copy link
Contributor

wyli commented Jan 6, 2022

This is a follow-up on the image writer of the previous I/O module proposals.

Goal and anticipated benefits

The current image writing module has been primarily built around the NIfTI specifications using the nibabel package as the backend.
The goal is to redesign the module in order to decouple the universal and the backend-specific implementations.

This will make it easier to bring in various writer backends such as ITK-Python, therefore supporting different image formats and more powerful writer module customisations.

related tickets: #2620 #2613 Project-MONAI/MONAILabel#211
proof of concept #3443

Details

The writer module should have (1) universal logic (2) backend-specific logic (3) mechanism for defaulting/selecting backends. Specifically:

(1) In the context of image writing, the universal logic handles the image-related outputs from the deep learning workflows, such as:

  • fetching data array and metadata
  • arrangements/resampling of spatial/channel dimensions
  • preparing target data types
  • preparing output folder structures, potentially compatible with BIDS (brain imaging data structure) and NDWB (neurodata without boarders)

(2) The backend-specific logic includes

  • creating data representations according to the backend APIs
  • calling the backend APIs to finish the data writing

(3) The backend selection logic operates based on the user-specified parameters, backend system availability, and current system default configurations.

@wyli wyli added this to MONAI 0.9 Jan 6, 2022
@wyli wyli added Feature request Module: data datasets, readers/writers, synthetic data Module: transform data transforms for preprocessing and postprocessing. labels Jan 6, 2022
@wyli
Copy link
Contributor Author

wyli commented Jan 12, 2022

#3443 is now a working POC, it provides (essential new features in monai/data/image_writer.py):

  • An ImageWriter base class with backend-agnostic utilities such as resample_if_needed
  • Subclasses (PILWriter, ITKWriter, NibabelWriter) with backend-specific logic, implementing
    • set_data_array
    • set_metadata
    • write
  • register_writer, resolve_writer so that it selects a suitable writer based on the file extension name
  • FolderLayout for generating organised filenames
  • Revised existing SaveImage transform to use the new writer module.

they are non-breaking changes, the primary usage will be:

writer_cls(output_dtype=self.output_dtype, scale=self.scale)
.set_data_array(img, channel_dim=0, squeeze_end_dims=self.squeeze_end_dims)
.set_metadata(meta_dict=meta_data, resample=self.resample, **self.resample_dict)
.write(filename, verbose=self.print_log)

and has the basic support for .nrrd and .dcm writing:

TEST_CASE_3 = [torch.randint(0, 255, (1, 2, 3, 4)), {"filename_or_obj": "testfile0.nrrd"}, ".nrrd", False]
TEST_CASE_4 = [
np.random.randint(0, 255, (3, 2, 4, 5), dtype=np.uint8),
{"filename_or_obj": "testfile0.dcm"},
".dcm",
False,

Please let me know any high-level comments, then I'll revise the design and submit smaller PRs to add these components into the core codebase.

After these I believe we can deprecate png_writer, png_saver, nifti_writer, nifti_saver.

cc @ericspod @rijobro @Nic-Ma @Project-MONAI/core-reviewers

@wyli wyli mentioned this issue Jan 12, 2022
7 tasks
@Nic-Ma
Copy link
Contributor

Nic-Ma commented Jan 13, 2022

Thanks @wyli for the great work to totally enhance our IO part!
The usage API is interesting:

writer_cls(output_dtype=self.output_dtype, scale=self.scale) 
.set_data_array(img, channel_dim=0, squeeze_end_dims=self.squeeze_end_dims) 
.set_metadata(meta_dict=meta_data, resample=self.resample, **self.resample_dict) 
.write(filename, verbose=self.print_log) 

I am not sure whether this "Java style" return self is also common practice in python world? @ericspod

Thanks.

@wyli
Copy link
Contributor Author

wyli commented Jan 13, 2022

thanks, it is possible to have a basic approach with the input arguments in one place:

write_cls.write(data, meta_dict, channel_dim, resample, filename, ...)

but

  • the argument list will be very long
  • it'll be difficult to modify/extend the subroutines within this function

the current approach divide the routine into set_data_arary, set_metadata, write parts. I feel it's flexible enough, but I'm happy to change or experiment with any other ideas...

@wyli
Copy link
Contributor Author

wyli commented Jan 14, 2022

may want to support metadata from existing objects

img = nibabel.load('img')
writer.set_metadata(img)

this is not implemented but should be feasible without changing the API

@ericspod
Copy link
Member

ericspod commented Jan 18, 2022

Thanks @wyli for the great work to totally enhance our IO part! The usage API is interesting:

writer_cls(output_dtype=self.output_dtype, scale=self.scale) 
.set_data_array(img, channel_dim=0, squeeze_end_dims=self.squeeze_end_dims) 
.set_metadata(meta_dict=meta_data, resample=self.resample, **self.resample_dict) 
.write(filename, verbose=self.print_log) 

I am not sure whether this "Java style" return self is also common practice in python world? @ericspod

Thanks.

It's rare in Python, it's more of a C++ thing I felt. I'm not totally sure it's a good pattern or not. A lot of the use cases for this involve making up for a shortcoming in a language which would otherwise force a cumbersome chain of method calls in separate statements. It makes sense when you want to provide an API for a configurable sequence of operations. Other language mechanisms like variable arguments or operator overload can be used instead (C++'s use of << in streams for example). Pytorch does use it to chain together inplace operations on tensors (add_, sub_) to closely replicate the structure of an equivalent expression using operators, that makes sense since Python doesn't provide explicit inplace operator expressions.

Here since any other use beyond these methods being called in this specific sequence isn't probably going to arise with these writers we could just have the call sequence as normal and not worry about returning self:

writer_obj = writer_cls(output_dtype=self.output_dtype, scale=self.scale) 
writer_obj.set_data_array(img, channel_dim=0, squeeze_end_dims=self.squeeze_end_dims) 
writer_obj.set_metadata(meta_dict=meta_data, resample=self.resample, **self.resample_dict) 
writer_obj.write(filename, verbose=self.print_log) 

@ericspod
Copy link
Member

I added a few comments for #3443 but otherwise I think it's in the right direction. I see that code has been moved to #3674 so the comments can be ported over.

@wyli
Copy link
Contributor Author

wyli commented Jan 18, 2022

thanks, I'll update #3674 #3443 to address these comments.

removed return in c6d8a9f

@deepib deepib moved this to In Progress in MONAI 0.9 Jan 19, 2022
Repository owner moved this from In Progress to Done in MONAI 0.9 Feb 10, 2022
@wyli
Copy link
Contributor Author

wyli commented Feb 11, 2022

with the latest monai.transform.SaveImage now it's possible to have:

from monai.data import ITKReader, ITKWriter
from monai.transforms import LoadImage, SaveImage

loader = LoadImage(reader=ITKReader)
data, meta = loader("avg152T1_LR_nifti.nii.gz")
saver = SaveImage(output_ext=".nrrd", writer=ITKWriter)
saver(data, meta)

if you are still interested in converting the format in Project-MONAI/MONAILabel#211 @SachidanandAlle @diazandr3s

@SachidanandAlle
Copy link
Contributor

We can start using this from core api directly.. we were waiting for this be available

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request Module: data datasets, readers/writers, synthetic data Module: transform data transforms for preprocessing and postprocessing.
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants