Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] LGBM hangs with high number of categories #6400

Closed
mavillan opened this issue Apr 1, 2024 · 2 comments
Closed

[python-package] LGBM hangs with high number of categories #6400

mavillan opened this issue Apr 1, 2024 · 2 comments

Comments

@mavillan
Copy link

mavillan commented Apr 1, 2024

Description

When attempting to train a model (classification or regression) with a dataset containing a categorial feature with 1035+ categories, LGBM hangs without displaying any messages. It remains stuck without processing anything (I checked the system monitor and it doesn't seem to be doing anything), and it also hangs for a really long time (I don't know exactly how long, the longest I've waited is 2 hours and it still didn't finish executing).

This issue only occurs with LGBM versions 4.2.0 and 4.3.0.

Reproducible example

import numpy as np
import pandas as pd 
import lightgbm as lgb

num_categories = 1034 # this works
num_categories = 1035 # this hangs lgbm infinitely

X = pd.DataFrame(
    np.random.random((10000, 5)),
    columns=[f'num_{i}' for i in range(5)]
)
X["cat"] = np.arange(10000) % num_categories
y = (np.random.random(10000) > 0.5).astype(int)

dset = lgb.Dataset(X, y, categorical_feature=["cat"])
model = lgb.train({'objective':'binary', 'verbose':2}, dset, num_boost_round=10)

Environment info

LightGBM version or commit hash: 4.2.0 and 4.3.0

Command(s) you used to install LightGBM

conda install -c conda-forge lightgbm==4.3.0

I tested it in OSX: 14.4 (23E214)

Additional Comments

I also tested it in Kaggle (original environment 2024-02-27) and got the same issue.

@jameslamb jameslamb changed the title LGBM hangs with high number of categories [python-package] LGBM hangs with high number of categories Apr 1, 2024
@jameslamb
Copy link
Collaborator

Thanks for using LightGBM, and for the excellent write-up!

This looks identical to the issue reported in #6273, and we have an in-progress pull request to fix it: #6394.

Sorry you're experiencing this. This is a bug that was introduced around lightgbm==4.2.0. You could try downgrading to lightgbm==4.1.0 to work around it until a release with that fix is published.

I'm going to close this as a duplicate of #6273 and add a comment there mentioning it. If you think they are different issues, please let me know.

@jameslamb
Copy link
Collaborator

I also want to say... I REALLY appreciate the effort you put into making this write-up clear and the example minimal and reproducible. Made it very easy to understand what was being reported and connect it to that existing bug.

Thank you so much for your effort!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants