You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 8, 2019. It is now read-only.
Makoto YUI edited this page Mar 16, 2016
·
5 revisions
Expanding numeric labels to actual count of samples can contribute to accuracy improvement in some cases. binarize_label explode a record that keeps the count of positive/negative labeled samples into corresponding actual count of samples. For example,
positive
negative
features
2
3
"[a:1, b:2]"
is converted into
features
label
"[a:1, b:2]"
0
"[a:1, b:2]"
0
"[a:1, b:2]"
1
"[a:1, b:2]"
1
"[a:1, b:2]"
1
Caution:Don't forget to shuffle converted training instances in a random order, e.g., by CLUSTER BY rand().
binarize_label(int/long positive, int/long negative, ANY arg1, ANY arg2, ..., ANY argN)
returns (ANY arg1, ANY arg2, ..., ANY argN, int label) where label is 0 or 1.