Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support remote schedulers #72

Closed
wants to merge 4 commits into from
Closed

Support remote schedulers #72

wants to merge 4 commits into from

Conversation

mrocklin
Copy link
Member

This PR includes two major changes:

  1. We now query the scheduler for its state, rather than assume that it is running locally
  2. We make a bunch of things async, in order to deal with the communication inherent in 1

Things seem to work well for me locally, except that the dashboard proxy seems unhappy.

I'm testing with Dask's new SSHCluster, which is based on a new internal standard which everything should hopefully be based off of in the near future.

@ian-r-rose if you have time I could use your help with the proxy bits. My attempt at this was the last commit. I get errors like the following:

/Users/mrocklin/miniconda/envs/dev2/lib/python3.7/asyncio/events.py:88: RuntimeWarning: coroutine 'ProxyHandler.proxy' was never awaited

My guess is that Jupyter-server-proxy doesn't support async functions for ProxyHandler.proxy subclasses.

For testing I'm using the following config file, which currently depends on a few not-yet-merged PRs in dask.distributed.

labextension:
  factory:
     module: 'distributed.deploy.ssh2'
     class: 'SSHCluster'
     args: [["localhost", "localhost", "localhost", "localhost", "localhost", "localhost"]]
     kwargs:
       connect_kwargs:
         known_hosts: null
       scheduler_kwargs:
         port: 0
         dashboard_address: :0
       worker_kwargs:
         nthreads: 3
         memory_limit: 4 GiB

Rather than assume that the Scheduler is local, we now query a remote
scheduler for information. This forces the `make_cluster_model` function
to be asynchronous, which in turn forces many other routes to be
asynchronous.
This doesn't seem to work correctly today
Then upgrade distributed to current master as of today
@mrocklin
Copy link
Member Author

This may still need some things in master. I plan to release distributed next week. I'll try this again after that.

@ian-r-rose
Copy link
Collaborator

This looks reasonable to me. I'm not sure what the issue might be with the proxy handler, but I'd be happy to look at it once the required distributed has been released.

Perhaps this needs to be awaited now? https://github.com/dask/dask-labextension/pull/72/files#diff-9dd27938c6735fe022a08088466052c1R61

@mrocklin
Copy link
Member Author

Perhaps this needs to be awaited now? https://github.com/dask/dask-labextension/pull/72/files#diff-9dd27938c6735fe022a08088466052c1R61

But the super implementation hasn't changed. I suspect that it is still synchronous.

@mrocklin
Copy link
Member Author

I'm also finding another workaround to this issue, so this approach may become moot.

mrocklin added a commit to mrocklin/dask-labextension that referenced this pull request Jul 28, 2019
Supercedes dask#72

This depends on dask/distributed#2902 , which
adds a `Cluster.scheduler_info` attribute to clusters which holds
necessary scheduler information.  We prefer this over querying a
Scheduler object directly in case that scheduler is not local, as in
increasingly becoming the case.
@mrocklin
Copy link
Member Author

Potentially superceded by #73 (if we choose to adopt a convention internally to dask.distributed cluster managers.

@ian-r-rose
Copy link
Collaborator

ian-r-rose commented Jul 28, 2019

Perhaps this needs to be awaited now? https://github.com/dask/dask-labextension/pull/72/files#diff-9dd27938c6735fe022a08088466052c1R61

But the super implementation hasn't changed. I suspect that it is still synchronous.

Weirdly, the super implementation is asynchronous. I'm not really sure what to make of that -- this is something that @yuvipanda knows much more about than I. I know he's out for a couple of weeks, but maybe we can ask him to weigh in when he's back.

mrocklin added a commit that referenced this pull request Aug 1, 2019
Supercedes #72

This depends on dask/distributed#2902 , which
adds a `Cluster.scheduler_info` attribute to clusters which holds
necessary scheduler information.  We prefer this over querying a
Scheduler object directly in case that scheduler is not local, as in
increasingly becoming the case.
@mrocklin mrocklin closed this Aug 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants