Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2583 Add DatasetSummary #2616

Merged
merged 13 commits into from
Jul 30, 2021

Conversation

yiheng-wang-nv
Copy link
Contributor

Signed-off-by: Yiheng Wang vennw@nvidia.com

Fixes #2583 .

Description

For 3D image segmentation tasks, users may not sure how to set a reasonable voxel spacing value, as well as the normalization parameters, this PR implements a class to address these issues.

Status

Ready

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

Signed-off-by: Yiheng Wang <vennw@nvidia.com>
Signed-off-by: Yiheng Wang <vennw@nvidia.com>
@yiheng-wang-nv
Copy link
Contributor Author

The initial implementation requires joblib, let me choose another way and then reopen this PR.

Signed-off-by: Yiheng Wang <vennw@nvidia.com>
@yiheng-wang-nv
Copy link
Contributor Author

Hi all, could you please help to review this PR?
I think this is needed, but I'm not sure if I put the class into the right place. Or should I use a tutorial rather than a MONAI feature here?

@madil90
Copy link
Contributor

madil90 commented Jul 19, 2021

/build

1 similar comment
@madil90
Copy link
Contributor

madil90 commented Jul 19, 2021

/build

@wyli
Copy link
Contributor

wyli commented Jul 19, 2021

Hi all, could you please help to review this PR?
I think this is needed, but I'm not sure if I put the class into the right place. Or should I use a tutorial rather than a MONAI feature here

Hi all, could you please help to review this PR?
I think this is needed, but I'm not sure if I put the class into the right place. Or should I use a tutorial rather than a MONAI feature here?

I think this could be part of monai/data/utils.

  • The implemention could use pytorch or monai dataloader instead of the multiprocessing lib
  • it's currently hard coded to use a datalist as in put, but it should be more flexible to take monai dataset as an input
  • would be great to reuse this utility to generate the relevant statistics of the dataset, instead of hardcoding the type of stats to be median/min/max

@yiheng-wang-nv
Copy link
Contributor Author

Hi all, could you please help to review this PR?
I think this is needed, but I'm not sure if I put the class into the right place. Or should I use a tutorial rather than a MONAI feature here

Hi all, could you please help to review this PR?
I think this is needed, but I'm not sure if I put the class into the right place. Or should I use a tutorial rather than a MONAI feature here?

I think this could be part of monai/data/utils.

  • The implemention could use pytorch or monai dataloader instead of the multiprocessing lib
  • it's currently hard coded to use a datalist as in put, but it should be more flexible to take monai dataset as an input
  • would be great to reuse this utility to generate the relevant statistics of the dataset,
    instead of hardcoding the type of stats to be median/min/max

Thanks @wyli . My question is that when the dataset has images in different spatial sizes, it cannot support batch_size > 1 directly. Therefore, this PR used multiple processing instead. Do you have any suggestions?

@wyli
Copy link
Contributor

wyli commented Jul 20, 2021

Hi all, could you please help to review this PR?
I think this is needed, but I'm not sure if I put the class into the right place. Or should I use a tutorial rather than a MONAI feature here

Hi all, could you please help to review this PR?
I think this is needed, but I'm not sure if I put the class into the right place. Or should I use a tutorial rather than a MONAI feature here?

I think this could be part of monai/data/utils.

  • The implemention could use pytorch or monai dataloader instead of the multiprocessing lib

  • it's currently hard coded to use a datalist as in put, but it should be more flexible to take monai dataset as an input

  • would be great to reuse this utility to generate the relevant statistics of the dataset,

    instead of hardcoding the type of stats to be median/min/max

Thanks @wyli . My question is that when the dataset has images in different spatial sizes, it cannot support batch_size > 1 directly. Therefore, this PR used multiple processing instead. Do you have any suggestions?

the dataloader is implemented with a proper multiprocessing support, I think you can use it with batch_size=1

Signed-off-by: Yiheng Wang <vennw@nvidia.com>
@yiheng-wang-nv
Copy link
Contributor Author

Hi all, could you please help to review this PR?
I think this is needed, but I'm not sure if I put the class into the right place. Or should I use a tutorial rather than a MONAI feature here

Hi all, could you please help to review this PR?
I think this is needed, but I'm not sure if I put the class into the right place. Or should I use a tutorial rather than a MONAI feature here?

I think this could be part of monai/data/utils.

  • The implemention could use pytorch or monai dataloader instead of the multiprocessing lib
  • it's currently hard coded to use a datalist as in put, but it should be more flexible to take monai dataset as an input
  • would be great to reuse this utility to generate the relevant statistics of the dataset,
    instead of hardcoding the type of stats to be median/min/max

Hi @wyli , thanks for the comments. The code may not be able to be in monai/data/utils, since it needs to import dataset and dataloader and putting into monai/data/utils could lead to circular import.

Now I've switched to use monai dataloader.
The input has been changed to use monai dataset.
As for the write_metrics_reports handler, it needs to calculate with all accumulated image intensities. I think the way I updated in the latest PR may be faster (see calculate_statistics function with can get mean, std, max and min and do not need to accumulate all intensities).

Could you please help to re-review the code, thanks!

@yiheng-wang-nv
Copy link
Contributor Author

Hi @ericspod , I've updated the code according to your comments, could you please help to re-review it, thanks!

@wyli
Copy link
Contributor

wyli commented Jul 28, 2021

looks like a useful utility, I'd suggest changing its name to DatasetSummary, what do you think?

all_intensities.append(intensities)

all_intensities = list(chain(*all_intensities))
self.data_min_percentile, self.data_max_percentile = np.percentile(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're still accumulating all the foreground pixel values from all the images here, unless we have some way of doing an incremental percentile calculation we are stuck with that.

@ericspod ericspod self-requested a review July 28, 2021 23:51
Copy link
Member

@ericspod ericspod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're good with a few minor changes possibly.

yiheng-wang-nv and others added 4 commits July 29, 2021 23:04
Signed-off-by: Yiheng Wang <vennw@nvidia.com>
Signed-off-by: Yiheng Wang <vennw@nvidia.com>
Signed-off-by: Yiheng Wang <vennw@nvidia.com>
@yiheng-wang-nv
Copy link
Contributor Author

Hi @wyli , I've changed the name, and also add a test case for anisotropic spacings.

Signed-off-by: Yiheng Wang <vennw@nvidia.com>
@wyli
Copy link
Contributor

wyli commented Jul 30, 2021

Perhaps in another iteration we can change the code to use some online histogram estimation such as https://github.com/maki-nage/distogram

@wyli wyli enabled auto-merge (squash) July 30, 2021 09:24
@wyli wyli changed the title 2583 Add DatasetCalculator 2583 Add DatasetSummary Jul 30, 2021
@wyli wyli merged commit 390fe7f into Project-MONAI:dev Jul 30, 2021
@yiheng-wang-nv yiheng-wang-nv deleted the 2583-add-dataset-calculator branch July 30, 2021 14:04
@yiheng-wang-nv
Copy link
Contributor Author

Mark: later I will update a tutorial in https://github.com/Project-MONAI/tutorials/tree/master/modules/dynunet_pipeline to show how can we get the spacings and normalization parameters.

wyli pushed a commit to wyli/MONAI that referenced this pull request Aug 3, 2021
* Add DatasetCalculator

Signed-off-by: Yiheng Wang <vennw@nvidia.com>

* update docstring

Signed-off-by: Yiheng Wang <vennw@nvidia.com>

* use multiprocessing

Signed-off-by: Yiheng Wang <vennw@nvidia.com>

* update to use dataset and other places

Signed-off-by: Yiheng Wang <vennw@nvidia.com>

* update to support array return

Signed-off-by: Yiheng Wang <vennw@nvidia.com>

* update with new testcases and change name

Signed-off-by: Yiheng Wang <vennw@nvidia.com>

* update min test

Signed-off-by: Yiheng Wang <vennw@nvidia.com>

* update unittest

Signed-off-by: Yiheng Wang <vennw@nvidia.com>

* fix vstack error

Signed-off-by: Yiheng Wang <vennw@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants