-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large bagging is very slow #628
Comments
can you change 0.5 in this line : https://github.com/Microsoft/LightGBM/blob/master/src/boosting/gbdt.cpp#L150, to 1e-6 , and try again? |
can you also try the 'bagging' branch ? |
@guolinke Switching to 1e-6 seems to fix the issue. On bagging branch, it was the same (branch is gone now?). |
@Laurae2 can you try the latest master branch? |
@Laurae2 |
Here some logs. I think some logs are out of place, no idea why. I'll retry with CLI. > Laurae::timer_func_print({model <- lgb.train(params = list(objective = "binary",
+ metric = "auc",
+ bin_construct_sample_cnt = 2250000L,
+ early_stopping_round = 25),
+ train,
+ 5,
+ list(test = test),
+ verbose = 2)})
[LightGBM] [Info] Number of positive: 742198, number of negative: 1507802
[LightGBM] [Info] Total Bins 6027180
[LightGBM] [Info] Number of data: 2250000, number of used features: 23636
[LightGBM] [Info] Trained a tree with leaves=31 and max_depth=11
[1]: test's auc:0.501172
[LightGBM] [Info] Trained a tree with leaves=31 and max_depth=11
[2]: test's auc:0.501379
[LightGBM] [Info] Trained a tree with leaves=31 and max_depth=8
[3]: test's auc:0.502558
[LightGBM] [Info] Trained a tree with leaves=31 and max_depth=12
[4]: test's auc:0.502981
[LightGBM] [Info] Trained a tree with leaves=31 and max_depth=10
[5]: test's auc:0.50398
The function ran in 37827.132 milliseconds.
[1] 37827.13
> rm(model)
> gc()
[LightGBM] [Info] GBDT::boosting costs 0.027171
[LightGBM] [Info] GBDT::train_score costs 0.012051
[LightGBM] [Info] GBDT::out_of_bag_score costs 0.000001
[LightGBM] [Info] GBDT::valid_score costs 0.006769
[LightGBM] [Info] GBDT::metric costs 0.000000
[LightGBM] [Info] GBDT::bagging costs 0.000003
[LightGBM] [Info] GBDT::bagging_subset_time costs 0.000000
[LightGBM] [Info] GBDT::reset_tree_learner_time costs 0.000000
[LightGBM] [Info] GBDT::sub_gradient costs 0.000000
[LightGBM] [Info] GBDT::tree costs 27.997018
[LightGBM] [Info] SerialTreeLearner::init_train costs 2.393088
[LightGBM] [Info] SerialTreeLearner::init_split costs 12.377513
[LightGBM] [Info] SerialTreeLearner::hist_build costs 10.837631
[LightGBM] [Info] SerialTreeLearner::find_split costs 2.226329
[LightGBM] [Info] SerialTreeLearner::split costs 0.070301
[LightGBM] [Info] SerialTreeLearner::ordered_bin costs 14.763519
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 692175 37.0 1168576 62.5 1168576 62.5
Vcells 3516642 26.9 5133766 39.2 4078954 31.2
> Laurae::timer_func_print({model <- lgb.train(params = list(objective = "binary",
+ metric = "auc",
+ bin_construct_sample_cnt = 2250000L,
+ early_stopping_round = 25,
+ bagging_freq = 1,
+ bagging_seed = 1,
+ bagging_fraction = 0.6),
+ train,
+ 5,
+ list(test = test),
+ verbose = 2)})
[LightGBM] [Info] Number of positive: 742198, number of negative: 1507802
[LightGBM] [Info] Total Bins 6027180
[LightGBM] [Info] Number of data: 2250000, number of used features: 23636
[LightGBM] [Debug] Re-bagging, using 1350000 data to train
[LightGBM] [Info] Trained a tree with leaves=31 and max_depth=12
[1]: test's auc:0.500272
[LightGBM] [Debug] Re-bagging, using 1350000 data to train
[LightGBM] [Info] Trained a tree with leaves=31 and max_depth=11
[2]: test's auc:0.500702
[LightGBM] [Debug] Re-bagging, using 1350000 data to train
[LightGBM] [Info] Trained a tree with leaves=31 and max_depth=11
[3]: test's auc:0.501856
[LightGBM] [Debug] Re-bagging, using 1350000 data to train
[LightGBM] [Info] Trained a tree with leaves=31 and max_depth=11
[4]: test's auc:0.503777
[LightGBM] [Debug] Re-bagging, using 1350000 data to train
[LightGBM] [Info] Trained a tree with leaves=31 and max_depth=10
[5]: test's auc:0.50587
The function ran in 24566.072 milliseconds.
[1] 24566.07
>
>
> rm(model)
> gc()
[LightGBM] [Info] GBDT::boosting costs 0.079639
[LightGBM] [Info] GBDT::train_score costs 0.025451
[LightGBM] [Info] GBDT::out_of_bag_score costs 0.065390
[LightGBM] [Info] GBDT::valid_score costs 0.016837
[LightGBM] [Info] GBDT::metric costs 0.000000
[LightGBM] [Info] GBDT::bagging costs 0.013739
[LightGBM] [Info] GBDT::bagging_subset_time costs 0.000000
[LightGBM] [Info] GBDT::reset_tree_learner_time costs 0.000000
[LightGBM] [Info] GBDT::sub_gradient costs 0.000000
[LightGBM] [Info] GBDT::tree costs 50.325836
[LightGBM] [Info] SerialTreeLearner::init_train costs 6.186171
[LightGBM] [Info] SerialTreeLearner::init_split costs 20.463570
[LightGBM] [Info] SerialTreeLearner::hist_build costs 18.954661
[LightGBM] [Info] SerialTreeLearner::find_split costs 4.454569
[LightGBM] [Info] SerialTreeLearner::split costs 0.111960
[LightGBM] [Info] SerialTreeLearner::ordered_bin costs 26.636777
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 694113 37.1 1168576 62.5 1168576 62.5
Vcells 3522613 26.9 6240519 47.7 4383248 33.5
> Laurae::timer_func_print({model <- lgb.train(params = list(objective = "binary",
+ metric = "auc",
+ bin_construct_sample_cnt = 2250000L,
+ early_stopping_round = 25,
+ bagging_freq = 1,
+ bagging_seed = 1,
+ bagging_fraction = 0.4),
+ train,
+ 5,
+ list(test = test),
+ verbose = 2)})
[LightGBM] [Info] Number of positive: 742198, number of negative: 1507802
[LightGBM] [Info] Total Bins 6027180
[LightGBM] [Info] Number of data: 2250000, number of used features: 23636
[LightGBM] [Debug] use subset for bagging
[LightGBM] [Debug] Re-bagging, using 900000 data to train
[LightGBM] [Info] Trained a tree with leaves=31 and max_depth=11
[1]: test's auc:0.501405
[LightGBM] [Debug] Re-bagging, using 900000 data to train
[LightGBM] [Info] Trained a tree with leaves=31 and max_depth=13
[2]: test's auc:0.502849
[LightGBM] [Debug] Re-bagging, using 900000 data to train
[LightGBM] [Info] Trained a tree with leaves=31 and max_depth=10
[3]: test's auc:0.504528
[LightGBM] [Debug] Re-bagging, using 900000 data to train
[LightGBM] [Info] Trained a tree with leaves=31 and max_depth=9
[4]: test's auc:0.506207
[LightGBM] [Debug] Re-bagging, using 900000 data to train
[LightGBM] [Info] Trained a tree with leaves=31 and max_depth=13
[5]: test's auc:0.506727
The function ran in 90240.890 milliseconds.
[1] 90240.89
> rm(model)
> gc()
[LightGBM] [Info] GBDT::boosting costs 0.165529
[LightGBM] [Info] GBDT::train_score costs 0.124710
[LightGBM] [Info] GBDT::out_of_bag_score costs 0.065391
[LightGBM] [Info] GBDT::valid_score costs 0.023227
[LightGBM] [Info] GBDT::metric costs 0.000000
[LightGBM] [Info] GBDT::bagging costs 76.685486
[LightGBM] [Info] GBDT::bagging_subset_time costs 28.801937
[LightGBM] [Info] GBDT::reset_tree_learner_time costs 47.856741
[LightGBM] [Info] GBDT::sub_gradient costs 0.007842
[LightGBM] [Info] GBDT::tree costs 61.569484
[LightGBM] [Info] SerialTreeLearner::init_train costs 7.088206
[LightGBM] [Info] SerialTreeLearner::init_split costs 26.233300
[LightGBM] [Info] SerialTreeLearner::hist_build costs 21.110779
[LightGBM] [Info] SerialTreeLearner::find_split costs 6.784831
[LightGBM] [Info] SerialTreeLearner::split costs 0.141303
[LightGBM] [Info] SerialTreeLearner::ordered_bin costs 33.290067
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 694853 37.2 1168576 62.5 1168576 62.5
Vcells 3523226 26.9 6240519 47.7 4389355 33.5 |
@guolinke Better logs below:
|
@guolinke This is with 1e-6 fix:
|
@Laurae2 Thanks for the help. |
@guolinke I am getting segmentation fault instead now.
|
@Laurae2 sorry, it has a bug. I just use "push -f" to fix it. |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Bagging is very slow. I am not sure what is causing it. See #562 for the dataset. I am using 0.40 subsampling to have this issue, it is not reproducible when subsampling is 0.60. I think bagging uses only 1 core, but I don't see this issue when using 0.60.
Using DLL compiled with Visual Studio 2017.
The text was updated successfully, but these errors were encountered: