Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check max_bin, etc. match config when using binary #3592

Merged
merged 2 commits into from
Dec 5, 2020

Conversation

cyfdecyf
Copy link
Contributor

To avoid binary file with non-matching parameters from being used.

@ghost
Copy link

ghost commented Nov 24, 2020

CLA assistant check
All CLA requirements met.

@guolinke
Copy link
Collaborator

Thank you @cyfdecyf

@StrikerRUS
Copy link
Collaborator

@guolinke What about other Dataset params from https://lightgbm.readthedocs.io/en/latest/Parameters.html#dataset-parameters, e.g. max_bin_by_feature, bin_construct_sample_cnt? Should they also be checked?

@guolinke
Copy link
Collaborator

@StrikerRUS yes, thank you.
@cyfdecyf can you add these parameters too?

@cyfdecyf
Copy link
Contributor Author

@guolinke OK. I will add checking for those parameters too.

@StrikerRUS
Copy link
Collaborator

I will add checking for those parameters too.

I think there are more params in Dataset Parameters section that can be checked.

Also, please refer to

static void CheckDatasetResetConfig(
.

@cyfdecyf
Copy link
Contributor Author

cyfdecyf commented Dec 1, 2020

I focused only on checking parameters that are stored in Dataset

std::string data_filename_;

What's the desired way to check parameters that's not in Dataset now? e.g. feature_pre_filter, data_random_seed etc.
Storing them in the binary file and only load them for checking? That would be a bigger change than this PR.

I will add checking for those parameters too.

I think there are more params in Dataset Parameters section that can be checked.

@guolinke
Copy link
Collaborator

guolinke commented Dec 3, 2020

@cyfdecyf yes, I think only check the stored parameter in the Dataset binary object is enough.
you can ping me once the PR is ready.

@cyfdecyf
Copy link
Contributor Author

cyfdecyf commented Dec 3, 2020

@guolinke I've already added checking for max_bin_by_feature, bin_construct_sample_cnt in 9d19083.

I've checked parameters listed in https://lightgbm.readthedocs.io/en/latest/Parameters.html#dataset-parameters against private members in Dataset, but not sure if there's any other parameters that still need to be checked.

@guolinke
Copy link
Collaborator

guolinke commented Dec 4, 2020

current implementation looks good to me. ping @StrikerRUS to confirm.

@StrikerRUS
Copy link
Collaborator

OK, I agree!

Thanks a lot @cyfdecyf for contribution!

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants