Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set GPU ThreadPoolExecutor and set known libraries to use it #5084

Open
mrocklin opened this issue Jul 17, 2021 · 11 comments
Open

Set GPU ThreadPoolExecutor and set known libraries to use it #5084

mrocklin opened this issue Jul 17, 2021 · 11 comments

Comments

@mrocklin
Copy link
Member

With @madsbk 's recent work allowing for multiple executors, we might consider making a GPU ThreadPoolExecutor in workers by default if a GPU is detected, and then annotating tasks known to be GPU-targetted with that annotation. This would improve the likelihood that a user of vanilla dask has a good time with RAPIDS, cupy, or other known project.

We probably can't do this in full generality (it's hard to detect what code has GPUs) but we're no worse off if we don't catch something, and we can handle the common cases well.

Concretely, I propose:

  1. Having the Worker class try importing pynvml (or some future NVIDIA python library) and if it detects a GPU then create a single-threaded ThreadPoolExecutor

    try:
        import pynvml
    except ImportError:
        pass
    else:
        if pynvml.do_I_have_a_gpu():
            self.executors["gpu"] = ThreadPoolExecutor
  2. In known GPU libraries we would add an annotation to every layer

    class ArrayLayer:
        def __init__(self, ...):
            if "cupy" in str(type(self._meta)):
                self.annotations["executor"] = "gpu" # or use setdefault or something

cc @madsbk @quasiben @kkraus14

dask-cuda handles this for users who use it. This feels like something that we could upstream. This would also help with CPU-GPU mixed computation.

@mrocklin
Copy link
Member Author

cc @rjzamora @pentschev

@mrocklin
Copy link
Member Author

So RAPIDS folks, would it be in-scope for RAPIDS to add annotations={"executor": "gpu"} to all layers?

mrocklin added a commit to mrocklin/distributed that referenced this issue Jul 26, 2021
@mrocklin
Copy link
Member Author

Adding this to the Worker here: #5123

@quasiben
Copy link
Member

@pentschev what do you think about this ? We've seen some cases recently where users run into issues where a GPU is "detected" and either they don't want to use or, in a rare case, the GPU is not actually there. The default, however, can be nice for the far majority of users who do want to use GPUs and Dask

@mrocklin , we have run into many issues around getting the order or creating a cuda context correct and forking processing and creating cuda contexts at the right time when using UCX (though I think UCX folks are working on resolving this bug)

@mrocklin
Copy link
Member Author

To be clear this doesn't make users use this executor. It just adds one for people to use if they want it. It doesn't change the default behavior of Dask. It just allows for people to use annotations like the following:

with dask.annotate(executor="gpu"):
    my dask code

And then they'll be assured that their code will run on single-threads

@pentschev
Copy link
Member

We've seen some cases recently where users run into issues where a GPU is "detected" and either they don't want to use or, in a rare case, the GPU is not actually there. The default, however, can be nice for the far majority of users who do want to use GPUs and Dask

Ben has a good point. In fact #5121 is fixing yet another of those. In that case the user is running Distributed with nvidia-docker but without a GPU, and causing another uncatched exception to be raised.

I think the overall idea is useful, and as long as this doesn't force a new default on users who happen to have GPU(s) installed, then I see no reason not to do that.

@mrocklin
Copy link
Member Author

I'm curious, would you all consider adding an executor="gpu" annotation to RAPIDS layers? If not, what situations would cause you to be concerned?

@mrocklin
Copy link
Member Author

I would like to try to upstream some of the more general parts of dask-cuda.

@jakirkham
Copy link
Member

I would like to try to upstream some of the more general parts of dask-cuda.

This sounds interesting. What other things would you be open to upstreaming? Some thoughts on things that might be of interest in upstream:

  • Explicit comms
  • JIT spilling
  • Transmission of spilled objects
  • CUDA-specific spilling
  • ?

Historically one of the issues here has been the lack of GPU support on CI. If we are able to tackle that ( dask/community#138 ), maybe this becomes more reasonable?

@mrocklin
Copy link
Member Author

I think that there are good ideas behind all of those things, and like the PR here does for threads and executors, we would need to find nice generic ways to incorporate them.

@rjzamora
Copy link
Member

would you all consider adding an executor="gpu" annotation to RAPIDS layers?

Peter and Ben probably have the best idea of what the appropriate defaults should be, but this particular request seems reasonable to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants