Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Thread Worker Option to ThreadDataLoader #4252

Merged
merged 25 commits into from
Jun 6, 2022

Conversation

ericspod
Copy link
Member

Signed-off-by: Eric Kerfoot eric.kerfoot@kcl.ac.uk

Description

Adds the ability to run workers in ThreadDataLoader as threads instead of processes. This is a fix for Windows when we have issues with its process spawning semantics.

Status

Work in progress

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

ericspod and others added 5 commits May 10, 2022 15:04
Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>
Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>
Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>
@Nic-Ma
Copy link
Contributor

Nic-Ma commented May 10, 2022

Hi @ericspod ,

I didn't check details of this PR as it's still WIP, but I want to raise a concern:
We didn't enable multi-threads for ThreadDataloader because most of MONAI random transforms are not thread-safe.
What do you think about it?
CC @wyli @rijobro

Thanks in advance.

@ericspod
Copy link
Member Author

I'm still dealing with a error I've had that's shown up as the failures. For those transforms that are thread-safe this would be an enhancement, we have those that aren't safe as inheriting from ThreadUnsafe but that may not be sufficient. All the random transforms aren't thread-safe because they share random states. This was meant to be a potential fix for some Windows issues that I wanted to put out there but I'll leave it as a WIP for now.

ericspod and others added 6 commits May 10, 2022 18:14
Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>
Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>
Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>
Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>
@wyli
Copy link
Contributor

wyli commented May 26, 2022

@ericspod @Nic-Ma I tried this PR with the fast training test, it works fine and it's slightly faster with my desktop (34s vs 36s)...this is how I run it:

diff --git a/tests/test_integration_fast_train.py b/tests/test_integration_fast_train.py
index 4dbb70b..8271a47 100644
--- a/tests/test_integration_fast_train.py
+++ b/tests/test_integration_fast_train.py
@@ -151,8 +151,8 @@ class IntegrationFastTrain(DistTestCase):
         train_ds = CacheDataset(data=train_files, transform=train_transforms, cache_rate=1.0, num_workers=8)
         val_ds = CacheDataset(data=val_files, transform=val_transforms, cache_rate=1.0, num_workers=5)
         # disable multi-workers because `ThreadDataLoader` works with multi-threads
-        train_loader = ThreadDataLoader(train_ds, num_workers=0, batch_size=4, shuffle=True)
-        val_loader = ThreadDataLoader(val_ds, num_workers=0, batch_size=1)
+        train_loader = ThreadDataLoader(train_ds, num_workers=2, use_thread_workers=True, batch_size=4, shuffle=True)
+        val_loader = ThreadDataLoader(val_ds, num_workers=2, use_thread_workers=True, batch_size=1)
 
         loss_function = DiceCELoss(to_onehot_y=True, softmax=True, squared_pred=True, batch=True)
         model = UNet(

I think we should merge this one...

@ericspod
Copy link
Member Author

We do know there is a thread-safety issue with many transforms which use the random state, things can sometimes train faster but there will be race issues which may preclude reproducibility. I need to have time to consider possible solutions to this, the purpose of this addition was to permit faster operation in some cases but also allows us to debug transform sequences in one process with a single worker thread.

ericspod and others added 13 commits May 26, 2022 14:30
Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>
Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>
Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>
Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>
Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>
Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>
@ericspod ericspod marked this pull request as ready for review June 5, 2022 19:08
@ericspod ericspod enabled auto-merge (squash) June 5, 2022 19:53
Copy link
Contributor

@Nic-Ma Nic-Ma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It overall looks good to me.
Put some comments inline.

Thanks.

monai/data/thread_buffer.py Show resolved Hide resolved
@wyli
Copy link
Contributor

wyli commented Jun 6, 2022

/build

@ericspod ericspod merged commit 22924f5 into Project-MONAI:dev Jun 6, 2022
@ericspod ericspod deleted the thread_dataloader_extension branch June 6, 2022 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants