Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] check duplicate params #961

Merged
merged 10 commits into from
Oct 7, 2017
Merged

[MRG] check duplicate params #961

merged 10 commits into from
Oct 7, 2017

Conversation

wxchan
Copy link
Contributor

@wxchan wxchan commented Oct 3, 2017

  1. check params which are alias to each other and print warnings
  2. add get_parameter function to let user check current parameters, easy to debug

@guolinke
Copy link
Collaborator

guolinke commented Oct 4, 2017

@wxchan I think this solution is too heavy.
If we want to add a new parameter, it needs to edit many places.

Maybe we can refactor the whole config class, for example, we can put all parameters into a unordered_map, and retrieve the value by using its names.

@@ -1397,6 +1397,50 @@ def reset_parameter(self, params):
self.handle,
c_str(params_str)))

def get_parameter(self, is_output_raw=False):
Copy link
Collaborator

@StrikerRUS StrikerRUS Oct 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that get_parameter in python API is needed anyway, but please update docstring with the aim of uniformity with other docstrings.

"""Get parameters of Booster.

Parameters
----------
is_output_raw : bool, optional (default=False)
    Whether to output raw string.

Returns
-------
param : dict or string
    Parameters of Booster.
    If ``is_output_raw=False``, dictionary of parameters with string key and string value;
    otherwise, string of parameters as concat of "key:value".
"""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already revert this. Looking for better solution.

@wxchan
Copy link
Contributor Author

wxchan commented Oct 4, 2017

@guolinke I agree. But what's your solution to save different types of values in unordered_map, a common way for me is std::variant, but it's not supported until c++17.

@wxchan
Copy link
Contributor Author

wxchan commented Oct 4, 2017

@guolinke btw, there may be some bug in python categorical feature, is Dataset and Booster using the same Overall Config in c_api now?

@guolinke
Copy link
Collaborator

guolinke commented Oct 5, 2017

@wxchan there using the different configs.

BTW, we can define a class, e.g.
class ParamVal{
int int_val;
double double_val;
bool bool_val;
std::string str_val;
std::vector double_arr_val;
...
}

And we can parse all these types when read a parameter, and retrieve the type we need.

@guolinke
Copy link
Collaborator

guolinke commented Oct 5, 2017

@wxchan another solution is using multi maps.

for example,

unordered_map::<string, int> int_map;
unordered_map::<string, double> double_map;
...

and using another map to check its types.

However, my concern is such a dynamic solution may cause the performance issue.

Can we simplify the GetParam and make it automatically ?

@wxchan
Copy link
Contributor Author

wxchan commented Oct 5, 2017

@guolinke one possible solution is to save params to a map in GetInt/GetDouble/GetBool/GetString or KeyAliasTransform. I can revert get_parameter if there is no good solutions now.

@wxchan
Copy link
Contributor Author

wxchan commented Oct 5, 2017

@guolinke if they are using overall config, when using get_parameter, I can only get parameters from overall config of Booster? How can I know whether overall config of Dataset is working?

@guolinke
Copy link
Collaborator

guolinke commented Oct 5, 2017

@wxchan currently, Dataset class doesn't store the parameters. The parameters are used temporarily when constructing Dataset.

@wxchan
Copy link
Contributor Author

wxchan commented Oct 5, 2017

@guolinke so with get_parameter, the result might not be the actual parameters used for datasets?

@guolinke
Copy link
Collaborator

guolinke commented Oct 5, 2017

@wxchan yeah. So you may need to store a IOConfig in the Dataset class.

@wxchan
Copy link
Contributor Author

wxchan commented Oct 5, 2017

@guolinke so how can I quickly confirm whether the categorical feature setting works? like in #893 . I think current logic should be right, though it's a little messy.

@wxchan
Copy link
Contributor Author

wxchan commented Oct 5, 2017

@guolinke I revert get_parameter first, you can review other part.

@wxchan wxchan changed the title [WIP] check duplicate params & add get_parameter [MRG] check duplicate params Oct 5, 2017
@guolinke
Copy link
Collaborator

guolinke commented Oct 5, 2017

@wxchan
In cpp side, the feature indices of the categorical features could be restored by using the bin_mappers and feature indices mappers.
in python side, the parameter of categorical features could be stored in the Dataset, and using that to check the settings.

if (alias != alias_table.end()) { // found alias
auto alias_set = tmp_map.find(alias->second);
if (alias_set != tmp_map.end()) { // alias already set
// set priority alphabetically to ensure reproducible behavior
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe using the strlen for the priority is better? since users often use the short names.

@@ -81,6 +81,7 @@ struct ConfigBase {
const std::unordered_map<std::string, std::string>& params,
const std::string& name, bool* out);

static void KVIntoMap(std::unordered_map<std::string, std::string>& params, const char* kv);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KV2Map ?

@guolinke guolinke merged commit d3e88ef into microsoft:master Oct 7, 2017
@wxchan wxchan deleted the param branch October 7, 2017 09:27
guolinke pushed a commit that referenced this pull request Oct 9, 2017
* add get params

clean codes

* check duplicate params

* Revert "add get params"

This reverts commit ec1d8dd.

* set priority by length & check duplicate

* rename function
@lock lock bot locked as resolved and limited conversation to collaborators Mar 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants