-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
make HostDeviceVector single gpu only #4773
Conversation
@rongou I haven't looked into this in detail yet. But seems |
@trivialfis yes there is a whole bunch of stuff that can be removed or cleaned up. I didn't want to make this bigger than it already is, but I'll do some of the more obvious ones. |
@rongou Nice. Please ping me when it's ready. |
It's ready now.
…On Wed, Aug 21, 2019, 11:25 PM Jiaming Yuan ***@***.***> wrote:
@rongou <https://github.com/rongou> Nice. Please ping me when it's ready.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4773?email_source=notifications&email_token=AADZLTOW4QM5OZSJXEOU5JDQFYWN3A5CNFSM4ILYBAO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD44AYIA#issuecomment-523766816>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADZLTKUSMRERF3BTETCZJTQFYWN3ANCNFSM4ILYBAOQ>
.
|
@@ -352,40 +310,45 @@ class GPUPredictor : public xgboost::Predictor { | |||
InitModel(model, tree_begin, tree_end); | |||
|
|||
size_t batch_offset = 0; | |||
auto* preds = out_preds; | |||
std::unique_ptr<HostDeviceVector<bst_float>> batch_preds{nullptr}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of dual copy:
- copying input prediction vector into a temp one based on batch size and,
- copying back the output prediction vector from the kernel back into
out_preds
is it not possible to:
- slice and dice the
out_preds
based on batch size - basically, get a device span fromout_preds
and get a subspan based on the batch size and batch offset - pass the subspan to predict internal as a pass through to the kernel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That'll assume the whole prediction vector will be on device, right? I think the current approach doesn't put the whole vector on device in external memory mode, although I haven't 100% verified it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even if the device copy is prevented here, it'll be copied right back (soon after this) in the objective functions while computing the gradients, or later when the prediction cache is updated during training.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable. I'll send a followup PR.
@RAMitchell @trivialfis @sriramch I think this is as far as this PR would go. Please take another look. I still want to remove the sharding in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! This will allow us focusing more on algorithm instead of data distribution.
Closes #4531
Part of 1.0.0 roadmap #4680
Finally got the CI to pass. I can do more cleanups, but thought you guys should take a look first.
@RAMitchell @trivialfis @sriramch