Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] document rounding behavior of floating point numbers in categorical features #5009

Merged
merged 1 commit into from
Feb 16, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/Advanced-Topics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ Categorical Feature Support

- Categorical features must be encoded as non-negative integers (``int``) less than ``Int32.MaxValue`` (2147483647).
It is best to use a contiguous range of integers started from zero.
Floating point numbers in categorical features will be rounded towards 0.

- Use ``min_data_per_group``, ``cat_smooth`` to deal with over-fitting (when ``#data`` is small or ``#category`` is large).

Expand Down
2 changes: 2 additions & 0 deletions python-package/lightgbm/basic.py
Original file line number Diff line number Diff line change
Expand Up @@ -1159,6 +1159,7 @@ def __init__(self, data, label=None, reference=None,
Large values could be memory consuming. Consider using consecutive integers starting from zero.
All negative values in categorical features will be treated as missing values.
The output cannot be monotonically constrained with respect to a categorical feature.
Floating point numbers in categorical features will be rounded towards 0.
params : dict or None, optional (default=None)
Other parameters for Dataset.
free_raw_data : bool, optional (default=True)
Expand Down Expand Up @@ -3563,6 +3564,7 @@ def refit(
Large values could be memory consuming. Consider using consecutive integers starting from zero.
All negative values in categorical features will be treated as missing values.
The output cannot be monotonically constrained with respect to a categorical feature.
Floating point numbers in categorical features will be rounded towards 0.
dataset_params : dict or None, optional (default=None)
Other parameters for Dataset ``data``.
free_raw_data : bool, optional (default=True)
Expand Down
2 changes: 2 additions & 0 deletions python-package/lightgbm/engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,7 @@ def train(
Large values could be memory consuming. Consider using consecutive integers starting from zero.
All negative values in categorical features will be treated as missing values.
The output cannot be monotonically constrained with respect to a categorical feature.
Floating point numbers in categorical features will be rounded towards 0.
keep_training_booster : bool, optional (default=False)
Whether the returned Booster will be used to keep training.
If False, the returned value will be converted into _InnerPredictor before returning.
Expand Down Expand Up @@ -463,6 +464,7 @@ def cv(params, train_set, num_boost_round=100,
Large values could be memory consuming. Consider using consecutive integers starting from zero.
All negative values in categorical features will be treated as missing values.
The output cannot be monotonically constrained with respect to a categorical feature.
Floating point numbers in categorical features will be rounded towards 0.
fpreproc : callable or None, optional (default=None)
Preprocessing function that takes (dtrain, dtest, params)
and returns transformed versions of those.
Expand Down
1 change: 1 addition & 0 deletions python-package/lightgbm/sklearn.py
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,7 @@ def __call__(self, preds, dataset):
Large values could be memory consuming. Consider using consecutive integers starting from zero.
All negative values in categorical features will be treated as missing values.
The output cannot be monotonically constrained with respect to a categorical feature.
Floating point numbers in categorical features will be rounded towards 0.
callbacks : list of callable, or None, optional (default=None)
List of callback functions that are applied at each iteration.
See Callbacks in Python API for more information.
Expand Down