-
-
Notifications
You must be signed in to change notification settings - Fork 730
Generalizing Dask-XGBoost #3075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I was looking at the _train function in https://github.com/dask/dask-xgboost/blob/master/dask_xgboost/core.py and I found this line find the locations of all chunks and map them to particular Dask workers
I made the following test code to see if this would use different workers but I am seeing
output
So given that the code above is using |
This seems to do the trick for me. |
That If you want the >>> client.who_has(client.futures_of(ds)[0])
{"('from_pandas-02ae7c2929339175d14a7c1c3e7c60b2', 0)": ('tcp://127.0.0.1:59229',)} |
I'd recommend avoiding
since that will be blocking each of the |
Awesome thank you for the clarification! |
I would expect these two print statements to output the same but they output
Perhaps I am misunderstanding something. Is number 0 in this case |
See #3236 |
In many ML workloads we want to do pre-processing with Dask, load all of the data into memory, and then hand off to some other system:
The Dask-XGBoost relationship does this in a few ways today:
The processes above work today, but there are some problems:
So, here are some things that we could do:
barrier
, orcollect_local_data_that_looks_like_X
, might be useful.I was doodling some pseudocode on a plane about what a solution for XGBoost might look like with some higher level primitives and came up with the following (although I don't think that people should read too much into it).
Disorganized XGBoost ravings
I think that focusing on what a good contract would look like for XGBoost, and then copying over one of the solutions for that, might be a helpful start.
cc @TomAugspurger @trivialfis @ogrisel @@RAMitchell
The text was updated successfully, but these errors were encountered: