Adding Thread Worker Option to ThreadDataLoader #4252

ericspod · 2022-05-10T14:06:36Z

Signed-off-by: Eric Kerfoot eric.kerfoot@kcl.ac.uk

Description

Adds the ability to run workers in ThreadDataLoader as threads instead of processes. This is a fix for Windows when we have issues with its process spawning semantics.

Status

Work in progress

Types of changes

Non-breaking change (fix or new feature that would not break existing functionality).
Breaking change (fix or new feature that would cause existing functionality to change).
New tests added to cover the changes.
Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
In-line docstrings updated.
Documentation updated, tested make html command in the docs/ folder.

Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>

for more information, see https://pre-commit.ci

Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>

Nic-Ma · 2022-05-10T16:28:06Z

Hi @ericspod ,

I didn't check details of this PR as it's still WIP, but I want to raise a concern:
We didn't enable multi-threads for ThreadDataloader because most of MONAI random transforms are not thread-safe.
What do you think about it?
CC @wyli @rijobro

Thanks in advance.

ericspod · 2022-05-10T17:12:03Z

I'm still dealing with a error I've had that's shown up as the failures. For those transforms that are thread-safe this would be an enhancement, we have those that aren't safe as inheriting from ThreadUnsafe but that may not be sufficient. All the random transforms aren't thread-safe because they share random states. This was meant to be a potential fix for some Windows issues that I wanted to put out there but I'll leave it as a WIP for now.

Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>

for more information, see https://pre-commit.ci

Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>

wyli · 2022-05-26T13:09:16Z

@ericspod @Nic-Ma I tried this PR with the fast training test, it works fine and it's slightly faster with my desktop (34s vs 36s)...this is how I run it:

diff --git a/tests/test_integration_fast_train.py b/tests/test_integration_fast_train.py
index 4dbb70b..8271a47 100644
--- a/tests/test_integration_fast_train.py
+++ b/tests/test_integration_fast_train.py
@@ -151,8 +151,8 @@ class IntegrationFastTrain(DistTestCase):
         train_ds = CacheDataset(data=train_files, transform=train_transforms, cache_rate=1.0, num_workers=8)
         val_ds = CacheDataset(data=val_files, transform=val_transforms, cache_rate=1.0, num_workers=5)
         # disable multi-workers because `ThreadDataLoader` works with multi-threads
-        train_loader = ThreadDataLoader(train_ds, num_workers=0, batch_size=4, shuffle=True)
-        val_loader = ThreadDataLoader(val_ds, num_workers=0, batch_size=1)
+        train_loader = ThreadDataLoader(train_ds, num_workers=2, use_thread_workers=True, batch_size=4, shuffle=True)
+        val_loader = ThreadDataLoader(val_ds, num_workers=2, use_thread_workers=True, batch_size=1)
 
         loss_function = DiceCELoss(to_onehot_y=True, softmax=True, squared_pred=True, batch=True)
         model = UNet(

I think we should merge this one...

ericspod · 2022-05-26T13:14:45Z

We do know there is a thread-safety issue with many transforms which use the random state, things can sometimes train faster but there will be race issues which may preclude reproducibility. I need to have time to consider possible solutions to this, the purpose of this addition was to permit faster operation in some cases but also allows us to debug transform sequences in one process with a single worker thread.

Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>

for more information, see https://pre-commit.ci

Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>

Nic-Ma

It overall looks good to me.
Put some comments inline.

Thanks.

monai/data/thread_buffer.py

wyli · 2022-06-06T17:01:01Z

/build

ericspod and others added 5 commits May 10, 2022 15:04

Adding thread worker option to ThreadDataLoader

17a4040

Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>

[pre-commit.ci] auto fixes from pre-commit.com hooks

7def8ae

for more information, see https://pre-commit.ci

Merge branch 'dev' into thread_dataloader_extension

5128fb9

Fixes

00b5854

Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>

Fixes

bdcc4ab

Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>

ericspod and others added 6 commits May 10, 2022 18:14

Reworking of implementation with global data fix

8099cbb

Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>

[pre-commit.ci] auto fixes from pre-commit.com hooks

97c0426

for more information, see https://pre-commit.ci

Merge branch 'dev' into thread_dataloader_extension

6c84faf

Fix

9fcce57

Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>

Fix

5960734

Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>

Fix

a380131

Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>

wyli approved these changes May 26, 2022

View reviewed changes

ericspod and others added 13 commits May 26, 2022 14:30

Merge branch 'dev' into thread_dataloader_extension

1b67296

Fix

cbfe314

Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>

Merge branch 'dev' into thread_dataloader_extension

fe5fb74

Tweak to cleanup

5991963

Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>

Merge branch 'dev' into thread_dataloader_extension

9a417bb

Merge branch 'dev' into thread_dataloader_extension

2e597a5

Try to disable thread test to see if global data is causing failure

fbabe1a

Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>

Undoing

0e0d6dd

Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>

Merge branch 'dev' into thread_dataloader_extension

3d47a03

Update

ae39b71

Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>

Merge branch 'dev' into thread_dataloader_extension

9bad081

[pre-commit.ci] auto fixes from pre-commit.com hooks

d8125c6

for more information, see https://pre-commit.ci

Fix

e76bbce

Signed-off-by: Eric Kerfoot <eric.kerfoot@kcl.ac.uk>

ericspod marked this pull request as ready for review June 5, 2022 19:08

ericspod enabled auto-merge (squash) June 5, 2022 19:53

Merge branch 'dev' into thread_dataloader_extension

a1d740f

Nic-Ma approved these changes Jun 6, 2022

View reviewed changes

monai/data/thread_buffer.py Show resolved Hide resolved

ericspod merged commit 22924f5 into Project-MONAI:dev Jun 6, 2022

ericspod deleted the thread_dataloader_extension branch June 6, 2022 23:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Thread Worker Option to ThreadDataLoader #4252

Adding Thread Worker Option to ThreadDataLoader #4252

ericspod commented May 10, 2022

Nic-Ma commented May 10, 2022

ericspod commented May 10, 2022

wyli commented May 26, 2022

ericspod commented May 26, 2022

Nic-Ma left a comment

wyli commented Jun 6, 2022

Adding Thread Worker Option to ThreadDataLoader #4252

Adding Thread Worker Option to ThreadDataLoader #4252

Conversation

ericspod commented May 10, 2022

Description

Status

Types of changes

Nic-Ma commented May 10, 2022

ericspod commented May 10, 2022

wyli commented May 26, 2022

ericspod commented May 26, 2022

Nic-Ma left a comment

Choose a reason for hiding this comment

wyli commented Jun 6, 2022