Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Using Client.wait_for_workers Does Not Properly Wait for Workers #4082

Open
Tracked by #3256
alexbarghi-nv opened this issue Jan 9, 2024 · 4 comments
Open
Tracked by #3256
Labels
bug Something isn't working

Comments

@alexbarghi-nv
Copy link
Member

While running benchmarks for the GNN packages in a multinode environment, @jnke2016 and I found that calling Client.wait_for_workers was not working properly, causing a hang or crash when running a dask workflow. Currently, we have a workaround that uses a separate script (wait_for_workers.py) to wait for all workers prior to launching a workflow. This workaround should be eliminated in favor of fixing the bug and calling Client.wait_for_workers as intended by the dask API.

@alexbarghi-nv alexbarghi-nv self-assigned this Jan 9, 2024
@alexbarghi-nv alexbarghi-nv added the bug Something isn't working label Jan 9, 2024
@alexbarghi-nv alexbarghi-nv added this to the 24.04 milestone Jan 9, 2024
@wence-
Copy link
Contributor

wence- commented Jan 12, 2024

Possibly related to dask/distributed#8314 ?

@alexbarghi-nv
Copy link
Member Author

Could be, I'll definitely test once that PR is merged.

@wence-
Copy link
Contributor

wence- commented Jan 17, 2024

Not sure it will be, sorry. The approach I had there was not considered appropriate long term. I'll see if I can dig up the current state of any discussions

@jnke2016
Copy link
Contributor

jnke2016 commented Feb 13, 2024

The approach I had there was not considered appropriate long term. I'll see if I can dig up the current state of any discussions

@wence- , did you get any feedback?

@alexbarghi-nv alexbarghi-nv modified the milestones: 24.04, 24.08 Apr 29, 2024
@alexbarghi-nv alexbarghi-nv removed their assignment May 8, 2024
@BradReesWork BradReesWork removed this from the 24.08 milestone Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants