[FEA] Provide a way/example to use cuCIM with Dask for DataLoading(Pytorch's Dataloader-like API) #99

gigony · 2021-09-07T22:01:59Z

Is your feature request related to a problem? Please describe.

PyTorch's DataLoader class is used in many DeepLearning training applications to load training data and pre-process the data, before feeding to AI model.

Since PyTorch's DataLoader is running in multi-processes, it is hard to use cuCIM's scikit-image APIs (which makes use of CUDA) in the pre-transformations of the DataLoader due to CUDA context issues.

It would be nice to provide a way/example to use cuCIM with DeepLearning Frameworks such as PyTorch.

Describe the solution you'd like

PyTorch's DataLoader works like this. It would be nice if we have a PyTorch's DataLoader-like utility class in Dask that mimics Pytorch's DataLoader behavior but implemented with Dask (dask-cuda) for the parallelization of data loading (so providing a generator/iterator that gives a batch of processed image data).

Describe alternatives you've considered

To use cuCIM in the training pipeline, we currently move GPU-accelerated pre-transforms from PyTorch DataLoader's transformation(using Compose) to the main thread (place GPU-based batch pre-transformation right before feeding to the AI model, and right after getting CPU-loaded/pre-transformed training data by DataLoader.), to avoid CUDA context issues.
It would be good if we also provide an example with that approach.

Additional context

Relevant information regarding CuPy+PyTorch.

With Numba to get cuda context.

https://github.com/numba/numba/blob/1fc53ecb6183f441498c0e082e748e6792e29791/numba/cuda/cudadrv/devices.py#L130

NV-jpt · 2021-09-14T17:16:46Z

I would be happy to take a look at this!

rjzamora · 2021-09-22T18:35:21Z

NVTabular's pytorch dataloader (which is built on Dask) may be agood reference here. I'll be happy to help advise on this work and clarify what NVTabular is (and isn't) doing.

NV-jpt · 2021-09-22T19:39:36Z

Thank you, @rjzamora !

NV-jpt · 2021-12-16T17:35:04Z

Unfortunately, I have come across some issues with the cuCIM codebase that are blocking this Dask-based DataLoading solution.

For example, while the array transformation in the cuCIM transform image_rotate_90() should be compatible with a Dask Array input, there is an explicit Type-Check here that throws a TypeError whenever trying to apply cuCIM transforms to a Dask Array.

In order to allow Dask to schedule cuCIM operations - we will likely want to make this "check" more of a duck-type check, that checks for the necessary API interface, as Dask Arrays should be able to pass those checks.

gigony added the feature request New feature or request label Sep 7, 2021

gigony assigned jakirkham Sep 7, 2021

NV-jpt mentioned this issue Sep 30, 2021

Create a utility class with dask-cuda that mimics Pytorch's DataLoader #120

Open

caryr35 added this to cucim Nov 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Provide a way/example to use cuCIM with Dask for DataLoading(Pytorch's Dataloader-like API) #99

[FEA] Provide a way/example to use cuCIM with Dask for DataLoading(Pytorch's Dataloader-like API) #99

gigony commented Sep 7, 2021

NV-jpt commented Sep 14, 2021

rjzamora commented Sep 22, 2021

NV-jpt commented Sep 22, 2021

NV-jpt commented Dec 16, 2021 •

edited

Loading

[FEA] Provide a way/example to use cuCIM with Dask for DataLoading(Pytorch's Dataloader-like API) #99

[FEA] Provide a way/example to use cuCIM with Dask for DataLoading(Pytorch's Dataloader-like API) #99

Comments

gigony commented Sep 7, 2021

NV-jpt commented Sep 14, 2021

rjzamora commented Sep 22, 2021

NV-jpt commented Sep 22, 2021

NV-jpt commented Dec 16, 2021 • edited Loading

NV-jpt commented Dec 16, 2021 •

edited

Loading