Replies: 15 comments 5 replies
-
Thanks for reporting this, I think the Have you tried
or
|
Beta Was this translation helpful? Give feedback.
-
@wyli Thanks for the quick reply! With spawn (
With Working results with |
Beta Was this translation helpful? Give feedback.
-
@wyli This topic is sadly not resolved however. Using ThreadDataloader modifies the logic of my real code, I believe the affine matrix for the spatial transforms is getting lost in the process. The transforms, similar to DeepEdit, look like this: # Initial transforms on the CPU which does not hurt since they are executed asynchronously and only once
InitLoggerd(args), # necessary if the dataloader runs in an extra thread / process
LoadImaged(keys=("image", "label"), reader="ITKReader"),
EnsureChannelFirstd(keys=("image", "label")),
NormalizeLabelsInDatasetd(keys="label", label_names=labels, device=device),
Orientationd(keys=["image", "label"], axcodes="RAS"),
Spacingd(keys=["image", "label"], pixdim=spacing),
CropForegroundd(keys=("image", "label"), source_key="image", select_fn=threshold_foreground),
ScaleIntensityRanged(keys="image", a_min=0, a_max=43, b_min=0.0, b_max=1.0, clip=True), # 0.05 and 99.95 percentiles of the spleen HUs
### Random Transforms ###
RandCropByPosNegLabeld(keys=("image", "label"), label_key="label", spatial_size=args.train_crop_size, pos=0.6, neg=0.4) if args.train_crop_size is not None else NoOpd(),
DivisiblePadd(keys=["image", "label"], k=64, value=0) if args.inferer == "SimpleInferer" else NoOpd(), # UNet needs this
RandFlipd(keys=("image", "label"), spatial_axis=[0], prob=0.10),
RandFlipd(keys=("image", "label"), spatial_axis=[1], prob=0.10),
RandFlipd(keys=("image", "label"), spatial_axis=[2], prob=0.10),
RandRotate90d(keys=("image", "label"), prob=0.10, max_k=3),
# Move to GPU
ToTensord(keys=("image", "label"), device=device, track_meta=False), With the Dataloader the resulting shape is torch.Size([1, 3, 224, 224, 320])
I have run into this error before and I believe this is due to |
Beta Was this translation helpful? Give feedback.
-
yes, please use |
Beta Was this translation helpful? Give feedback.
-
Does not change anything. Do I have to call that sooner? The ToTensord call is after all the relevant transforms imo. |
Beta Was this translation helpful? Give feedback.
-
Same problem, even with ToTensord right at the start: InitLoggerd(args), # necessary if the dataloader runs in an extra thread / process
LoadImaged(keys=("image", "label"), reader="ITKReader"),
ToTensord(keys=("image", "label"), device=device, track_meta=True),
EnsureChannelFirstd(keys=("image", "label")),
NormalizeLabelsInDatasetd(keys="label", label_names=labels, device=device),
Orientationd(keys=["image", "label"], axcodes="RAS"),
Spacingd(keys=["image", "label"], pixdim=spacing),
CropForegroundd(keys=("image", "label"), source_key="image", select_fn=threshold_foreground),
ScaleIntensityRanged(keys="image", a_min=0, a_max=43, b_min=0.0, b_max=1.0, clip=True), # 0.05 and 99.95 percentiles of the spleen HUs |
Beta Was this translation helpful? Give feedback.
-
I just changed my code to Dataset instead of PersistentDataset, just to be sure this is no caching effect. Same results on Dataset |
Beta Was this translation helpful? Give feedback.
-
ok... perhaps this |
Beta Was this translation helpful? Give feedback.
-
Same result, so the transforms still don't work as expected, plus the code then crashed later of course since information is missing. The code is extremely similar to this one https://github.com/Project-MONAI/MONAI/blob/dev/monai/apps/deepedit/transforms.py#L86 , so it should check if the tensor is normal or a MetaTensor |
Beta Was this translation helpful? Give feedback.
-
@wyli Would be great if you find the time to check that out 😊 As I said, with Dataloader it works, with ThreadDataloader it doesn't. Just for completeness I'll paste the calling code for both below. - train_loader = DataLoader(
- train_ds, shuffle=True, num_workers=args.num_workers, batch_size=1, multiprocessing_context='spawn', persistent_workers=True,
+ train_loader = ThreadDataLoader(
+ train_ds, shuffle=True, num_workers=args.num_workers, batch_size=1, multiprocessing_context='spawn', use_thread_workers=True#, persistent_workers=True, |
Beta Was this translation helpful? Give feedback.
-
sure, I forgot to mention that the issue of |
Beta Was this translation helpful? Give feedback.
-
Ah no worries there, I am using args.num_workers==1 per default. Good to know anyways, then I won't increase it. I found old code mentioning setting num_workers to 0 but that no longer works. |
Beta Was this translation helpful? Give feedback.
-
Hi @diazandr3s I'm not sure about the root cause of this deepedit transform + ThreadDataloader issue, please have a look if you have time.. thanks! (converting this to a discussion for now, please feel free to create a bug report if it's not a usage question) |
Beta Was this translation helpful? Give feedback.
-
Since changing one single line, DataLoader to ThreadDataLoader, changes how transforms work and renders ThreadDataLoader unusable to me I would consider this to be a bug. I will paste below how the different batchdata's look after the LoadImaged() call (done with PrintDatad(), a transform I added). Should be easy to debug for the person who created the code. The problem is that with use_thread_workers=True, Tensors are returned instead of MetaTensors, which means the Meta dicts get lost. I guess this is the dict used by the other transforms and they just ignore the other setting where this is information is: image_meta_dict/original_affine with use_thread_workers=True
with use_thread_workers not set
|
Beta Was this translation helpful? Give feedback.
-
@wyli I do have some more context for you now. The
I now get:
Not sure if it is intended that changing a single dataloader flag leads to a completely different returned data type. So the affine information is not getting read correctly because of the loaded image and label not being of type MetaTensor. Even by explicitly adding a Maybe that helps debugging the issue. |
Beta Was this translation helpful? Give feedback.
-
Describe the bug
I just tried to get some sample code for #6626 but ran into a warning I have seen many times before. The problem appears when the transform pushed code the GPU and the data is then handed over from the Dataloader Thread to the main Thread.
This is no hard bug but it is very annoying since it gets spammed a lot.
Temporary workaround which I found is to add "persistent_workers=True," to the DataLoader, then the warning gets only shown at the end of the program, sometimes never.
Warning message:
To Reproduce
Run this code, minimal sample:
Expected behavior
No Cuda Warnings
Environment
Verified on different environments.
Additional context
Adding an evaluator further complicates the warnings and a new warning is now shown:
The code for that:
Beta Was this translation helpful? Give feedback.
All reactions