Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define the new device parameter. #9362

Merged
merged 20 commits into from
Jul 13, 2023
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion CITATION
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,3 @@
address = {New York, NY, USA},
keywords = {large-scale machine learning},
}

9 changes: 5 additions & 4 deletions doc/gpu/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,21 +22,22 @@ Supported parameters
GPU accelerated prediction is enabled by default for the above mentioned ``tree_method`` parameters but can be switched to CPU prediction by setting ``predictor`` to ``cpu_predictor``. This could be useful if you want to conserve GPU memory. Likewise when using CPU algorithms, GPU accelerated prediction can be enabled by setting ``predictor`` to ``gpu_predictor``.

The device ordinal (which GPU to use if you have many of them) can be selected using the
``gpu_id`` parameter, which defaults to 0 (the first device reported by CUDA runtime).
``device`` parameter, which defaults to 0 when "CUDA" is specified(the first device reported by CUDA
runtime).


The GPU algorithms currently work with CLI, Python, R, and JVM packages. See :doc:`/install` for details.

.. code-block:: python
:caption: Python example

param['gpu_id'] = 0
param["device"] = "cuda:0"
param['tree_method'] = 'gpu_hist'

.. code-block:: python
:caption: With Scikit-Learn interface

XGBRegressor(tree_method='gpu_hist', gpu_id=0)
XGBRegressor(tree_method='gpu_hist', device="cuda")
RAMitchell marked this conversation as resolved.
Show resolved Hide resolved


GPU-Accelerated SHAP values
Expand All @@ -45,7 +46,7 @@ XGBoost makes use of `GPUTreeShap <https://github.com/rapidsai/gputreeshap>`_ as

.. code-block:: python

model.set_param({"gpu_id": "0", "tree_method": "gpu_hist"})
model.set_param({"device": "cuda:0", "tree_method": "gpu_hist"})
shap_values = model.predict(dtrain, pred_contribs=True)
shap_interaction_values = model.predict(dtrain, pred_interactions=True)

Expand Down
8 changes: 4 additions & 4 deletions doc/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@ Installation Guide
##################

XGBoost provides binary packages for some language bindings. The binary packages support
the GPU algorithm (``gpu_hist``) on machines with NVIDIA GPUs. Please note that **training
with multiple GPUs is only supported for Linux platform**. See :doc:`gpu/index`. Also we
have both stable releases and nightly builds, see below for how to install them. For
building from source, visit :doc:`this page </build>`.
the GPU algorithm (``device=cuda:0``) on machines with NVIDIA GPUs. Please note that
**training with multiple GPUs is only supported for Linux platform**. See
:doc:`gpu/index`. Also we have both stable releases and nightly builds, see below for how
to install them. For building from source, visit :doc:`this page </build>`.

.. contents:: Contents

Expand Down
39 changes: 20 additions & 19 deletions doc/parameter.rst
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,18 @@ General Parameters

- Feature dimension used in boosting, set to maximum dimension of the feature

* ``device`` [default= ``cpu``]

.. versionadded:: 2.0.0

- Device for XGBoost to run. User can set it to one of the following values:

+ ``cpu``: Use CPU.
+ ``cuda``: Use a GPU (CUDA device).
+ ``cuda:<ordinal>``: ``<ordinal>`` is an integer that specifies the ordinal of the GPU (which GPU do you want to use if you have more than one devices).
+ ``gpu``: Same as ``cuda``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that now this are equivalent from XGBoost perspective, but as far we are talking about API used in next many years would it make sense to not introduce this restriction?

Copy link
Member Author

@trivialfis trivialfis Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely, that's why we have gpu and cuda. It's equal to cuda "for now", in the future others can make variants based on available GPU devices at run time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@napetrov Feel free to share your suggestions. ;-) Would like to get more opinions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean from API perspective users would be writing code assuming GPU device are always CUDA, but this would results in breaking change when there would be another GPU backend.

Might be it worth to point that this is not the same but just a convenient way for selecting default GPU device - so in long-term this would result in GPU dispatching regardless of particular HW. i.e. now this is not same as 'cuda' but is default GPU device selector although it can select only from ['cuda']. In this way we would clearly define expectations from API and would allow extensions here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for sharing, will change the document as suggested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
+ ``gpu``: Same as ``cuda``.
+ ``gpu``: Default GPU device selection from the list of available and supported devices. Only ''cuda'' devices are supported currently.

Something along this lines

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

+ ``gpu:<ordinal>``: Same as ``cuda:<ordinal>``.

Parameters for Tree Booster
===========================
* ``eta`` [default=0.3, alias: ``learning_rate``]
Expand Down Expand Up @@ -99,7 +111,7 @@ Parameters for Tree Booster
- ``gradient_based``: the selection probability for each training instance is proportional to the
*regularized absolute value* of gradients (more specifically, :math:`\sqrt{g^2+\lambda h^2}`).
``subsample`` may be set to as low as 0.1 without loss of model accuracy. Note that this
sampling method is only supported when ``tree_method`` is set to ``gpu_hist``; other tree
sampling method is only supported when ``tree_method`` is set to ``hist`` and the device is ``cuda``; other tree
methods only support ``uniform`` sampling.

* ``colsample_bytree``, ``colsample_bylevel``, ``colsample_bynode`` [default=1]
Expand Down Expand Up @@ -131,26 +143,15 @@ Parameters for Tree Booster
* ``tree_method`` string [default= ``auto``]

- The tree construction algorithm used in XGBoost. See description in the `reference paper <http://arxiv.org/abs/1603.02754>`_ and :doc:`treemethod`.
- XGBoost supports ``approx``, ``hist`` and ``gpu_hist`` for distributed training. Experimental support for external memory is available for ``approx`` and ``gpu_hist``.

- Choices: ``auto``, ``exact``, ``approx``, ``hist``, ``gpu_hist``, this is a
combination of commonly used updaters. For other updaters like ``refresh``, set the
parameter ``updater`` directly.

- ``auto``: Use heuristic to choose the fastest method.

- For small dataset, exact greedy (``exact``) will be used.
- For larger dataset, approximate algorithm (``approx``) will be chosen. It's
recommended to try ``hist`` and ``gpu_hist`` for higher performance with large
dataset.
(``gpu_hist``)has support for ``external memory``.
- Choices: ``auto``, ``exact``, ``approx``, ``hist``, this is a combination of commonly
used updaters. For other updaters like ``refresh``, set the parameter ``updater``
directly.

- Because old behavior is always use exact greedy in single machine, user will get a
message when approximate algorithm is chosen to notify this choice.
- ``auto``: Same as the ``hist`` tree method.
- ``exact``: Exact greedy algorithm. Enumerates all split candidates.
- ``approx``: Approximate greedy algorithm using quantile sketch and gradient histogram.
- ``hist``: Faster histogram optimized approximate greedy algorithm.
- ``gpu_hist``: GPU implementation of ``hist`` algorithm.

* ``scale_pos_weight`` [default=1]

Expand All @@ -163,7 +164,7 @@ Parameters for Tree Booster
- ``grow_colmaker``: non-distributed column-based construction of trees.
- ``grow_histmaker``: distributed tree construction with row-based data splitting based on global proposal of histogram counting.
- ``grow_quantile_histmaker``: Grow tree using quantized histogram.
- ``grow_gpu_hist``: Grow tree with GPU.
- ``grow_gpu_hist``: Grow tree with GPU. Same as setting tree method to ``hist`` and use ``device=cuda``.
- ``sync``: synchronizes trees in all distributed nodes.
- ``refresh``: refreshes tree's statistics and/or leaf values based on the current data. Note that no random subsampling of data rows is performed.
- ``prune``: prunes the splits where loss < min_split_loss (or gamma) and nodes that have depth greater than ``max_depth``.
Expand All @@ -183,7 +184,7 @@ Parameters for Tree Booster
* ``grow_policy`` [default= ``depthwise``]

- Controls a way new nodes are added to the tree.
- Currently supported only if ``tree_method`` is set to ``hist``, ``approx`` or ``gpu_hist``.
- Currently supported only if ``tree_method`` is set to ``hist`` or ``approx``.
- Choices: ``depthwise``, ``lossguide``

- ``depthwise``: split at nodes closest to the root.
Expand All @@ -195,7 +196,7 @@ Parameters for Tree Booster

* ``max_bin``, [default=256]

- Only used if ``tree_method`` is set to ``hist``, ``approx`` or ``gpu_hist``.
- Only used if ``tree_method`` is set to ``hist`` or ``approx``.
- Maximum number of discrete bins to bucket continuous features.
- Increasing this number improves the optimality of splits at the cost of higher computation time.

Expand Down
60 changes: 28 additions & 32 deletions doc/treemethod.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,14 @@ Tree Methods
############

For training boosted tree models, there are 2 parameters used for choosing algorithms,
namely ``updater`` and ``tree_method``. XGBoost has 4 builtin tree methods, namely
``exact``, ``approx``, ``hist`` and ``gpu_hist``. Along with these tree methods, there
are also some free standing updaters including ``refresh``,
``prune`` and ``sync``. The parameter ``updater`` is more primitive than ``tree_method``
as the latter is just a pre-configuration of the former. The difference is mostly due to
historical reasons that each updater requires some specific configurations and might has
missing features. As we are moving forward, the gap between them is becoming more and
more irrelevant. We will collectively document them under tree methods.
namely ``updater`` and ``tree_method``. XGBoost has 3 builtin tree methods, namely
``exact``, ``approx`` and ``hist``. Along with these tree methods, there are also some
free standing updaters including ``refresh``, ``prune`` and ``sync``. The parameter
``updater`` is more primitive than ``tree_method`` as the latter is just a
pre-configuration of the former. The difference is mostly due to historical reasons that
each updater requires some specific configurations and might has missing features. As we
are moving forward, the gap between them is becoming more and more irrelevant. We will
collectively document them under tree methods.

**************
Exact Solution
Expand All @@ -19,23 +19,23 @@ Exact Solution
Exact means XGBoost considers all candidates from data for tree splitting, but underlying
the objective is still interpreted as a Taylor expansion.

1. ``exact``: Vanilla gradient boosting tree algorithm described in `reference paper
<http://arxiv.org/abs/1603.02754>`_. During each split finding procedure, it iterates
over all entries of input data. It's more accurate (among other greedy methods) but
slow in computation performance. Also it doesn't support distributed training as
XGBoost employs row spliting data distribution while ``exact`` tree method works on a
sorted column format. This tree method can be used with parameter ``tree_method`` set
to ``exact``.
1. ``exact``: The vanilla gradient boosting tree algorithm described in `reference paper
<http://arxiv.org/abs/1603.02754>`_. During split-finding, it iterates over all
entries of input data. It's more accurate (among other greedy methods) but
computationally slower in compared to other tree methods. Further more, its feature
set is limited. Features like distributed training and external memory that require
approximated quantiles are not supported. This tree method can be used with the
parameter ``tree_method`` set to ``exact``.


**********************
Approximated Solutions
**********************

As ``exact`` tree method is slow in performance and not scalable, we often employ
approximated training algorithms. These algorithms build a gradient histogram for each
node and iterate through the histogram instead of real dataset. Here we introduce the
implementations in XGBoost below.
As ``exact`` tree method is slow in computation performance and difficult to scale, we
often employ approximated training algorithms. These algorithms build a gradient
histogram for each node and iterate through the histogram instead of real dataset. Here
we introduce the implementations in XGBoost.

1. ``approx`` tree method: An approximation tree method described in `reference paper
<http://arxiv.org/abs/1603.02754>`_. It runs sketching before building each tree
Expand All @@ -48,22 +48,18 @@ implementations in XGBoost below.
this global sketch. This is the fastest algorithm as it runs sketching only once. The
algorithm can be accessed by setting ``tree_method`` to ``hist``.

3. ``gpu_hist`` tree method: The ``gpu_hist`` tree method is a GPU implementation of
``hist``, with additional support for gradient based sampling. The algorithm can be
accessed by setting ``tree_method`` to ``gpu_hist``.

************
Implications
************

Some objectives like ``reg:squarederror`` have constant hessian. In this case, ``hist``
or ``gpu_hist`` should be preferred as weighted sketching doesn't make sense with constant
Some objectives like ``reg:squarederror`` have constant hessian. In this case, the
``hist`` should be preferred as weighted sketching doesn't make sense with constant
weights. When using non-constant hessian objectives, sometimes ``approx`` yields better
accuracy, but with slower computation performance. Most of the time using ``(gpu)_hist``
with higher ``max_bin`` can achieve similar or even superior accuracy while maintaining
good performance. However, as xgboost is largely driven by community effort, the actual
implementations have some differences than pure math description. Result might have
slight differences than expectation, which we are currently trying to overcome.
accuracy, but with slower computation performance. Most of the time using ``hist`` with
higher ``max_bin`` can achieve similar or even superior accuracy while maintaining good
performance. However, as xgboost is largely driven by community effort, the actual
implementations have some differences than pure math description. Result might be
slightly different than expectation, which we are currently trying to overcome.

**************
Other Updaters
Expand Down Expand Up @@ -106,8 +102,8 @@ solely for the interest of documentation.
histogram creation step and uses sketching values directly during split evaluation. It
was never tested and contained some unknown bugs, we decided to remove it and focus our
resources on more promising algorithms instead. For accuracy, most of the time
``approx``, ``hist`` and ``gpu_hist`` are enough with some parameters tuning, so
removing them don't have any real practical impact.
``approx`` and ``hist`` are enough with some parameters tuning, so removing them don't
have any real practical impact.

3. ``grow_local_histmaker`` updater: An approximation tree method described in `reference
paper <http://arxiv.org/abs/1603.02754>`_. This updater was rarely used in practice so
Expand Down
2 changes: 1 addition & 1 deletion doc/tutorials/dask.rst
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ Also for inplace prediction:
.. code-block:: python

# where X is a dask DataFrame or dask Array backed by cupy or cuDF.
booster.set_param({"gpu_id": "0"})
booster.set_param({"device": "cuda:0"})
prediction = xgb.dask.inplace_predict(client, booster, X)

When input is ``da.Array`` object, output is always ``da.Array``. However, if the input
Expand Down
2 changes: 1 addition & 1 deletion doc/tutorials/saving_model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ Will print out something similar to (not actual output as it's too long for demo
{
"Learner": {
"generic_parameter": {
"gpu_id": "0",
"device": "cuda:0",
"gpu_page_size": "0",
"n_jobs": "0",
"random_state": "0",
Expand Down
2 changes: 1 addition & 1 deletion include/xgboost/base.h
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ using bst_group_t = std::uint32_t; // NOLINT
*/
using bst_target_t = std::uint32_t; // NOLINT
/**
* brief Type for indexing boosted layers.
* @brief Type for indexing boosted layers.
*/
using bst_layer_t = std::int32_t; // NOLINT
/**
Expand Down
Loading