-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary format needed - slow model reading #372
Comments
@guolinke Do you think a binary format for models is appropriate? (and to add to the API) This is what xgboost does for speeding up model saving/loading. Just for comparison:
I tested on a private dataset with 10,000 iterations and 256 leaves, using a 2GBps PCI-E SSD to make sure the SSD is not a bottleneck, run 10x, reported in milliseconds:
N.B: xgboost RDS got a consistent prediction speed loss (tested 50 times .save and .RDS prediction time), but is usable as is unlike LightGBM (which makes uses of .save/.load indirectly via RDS to be re-usable). |
@Laurae2 yes. The binary model format is needed. But I am busy with other things recently. |
@guolinke There is no flag to indicate that binary or text mode required in current interface of GBDT::SaveModelToFile(int num_iteration, const char* filename) method.
|
@limexp |
@guolinke It's hard to change binary file format in the future and not to ruin existing saved models, or it would require to add versions support and make code complex. Of course there are universal solutions like protobuf. Decision about interface is not so critical, but it has direct impact on codebase, tests and compatibility. I'm not asking for a final solution but looking for a direction if you have one. |
AFAIU, concerning the benchmark:
|
anyone interested in helping test loading and saving model with protobuf? it's in branch: https://github.com/wxchan/LightGBM/tree/proto, you can cmake with a simple test: |
I personally find the text format the best feature of lightgbm. You can easily check things like how many trees and so on are being used without any additional commands which is complicated know binary models |
Seems like the protobuf model has been merged and so this issue can be closed ? |
it might be reverted, we are looking for a better solution @AbdealiJK |
I think model read/write is much faster now. Please have a try. |
yes, it is much faster now... great job |
Any solution to this? I have trained and saved a LGB model and the file is almost 18GB. |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
I have built many models, some of them are big 300M file or bigger (10k trees ie).
In such cases predict phase is slow (not too much but still, when one is using stacking with cross validation this can slow done prediction calculation badly ~2hours).
I find out it is not problem of calling model.predict() this is reasonable fast.
Problem is loading model from disk:
model = lg.Booster(model_file = workingDir + '/modely/model_' + str(cv) + '_' + str(sc) + '.txt')
Is there any way to speed this up?
There is save_binary property, but only for datasets.
I am saving models with:
model.save_model(working_dir + '/modely/model_' + str(cv) + '_' + str(sc) + '.txt', num_iteration = model.best_iteration)
thx for any help..
The text was updated successfully, but these errors were encountered: