This repository has been archived by the owner on Jun 22, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 21
LightGBM and 5fold CV
Kamil A. Kaczmarek edited this page Jul 10, 2018
·
2 revisions
5-fold CV with folds generated via custom implementation, where data is sorted by target value and then observations are put to the separate folds one by one. It is implemented in the utils.py:L111
class KFoldByTargetValue(BaseCrossValidator):
def __init__(self, n_splits=3, shuffle=False, random_state=None):
self.n_splits = n_splits
self.shuffle = shuffle
self.random_state = random_state
def _iter_test_indices(self, X, y=None, groups=None):
n_samples = X.shape[0]
indices = np.arange(n_samples)
sorted_idx_vals = sorted(zip(indices, X), key=lambda x: x[1])
indices = [idx for idx, val in sorted_idx_vals]
for split_start in range(self.n_splits):
split_indeces = indices[split_start::self.n_splits]
yield split_indeces
def get_n_splits(self, X=None, y=None, groups=None):
return self.n_split
- drop constant columns
- drop duplicate columns
- drop columns where zero over % of time
- as is (taken directly from competition data)
- lightGBM raw 1.39 CV 1.43 Public LB
- zero treated as missing
check our GitHub organization https://github.com/neptune-ml for more cool stuff 😃
Kamil & Kuba, core contributors
- honey bee 🐝 LightGBM and 5fold CV
- beetle 🪲 LightGBM on binarized dataset
- dromedary camel 🐪 LightGBM with row aggregations
- whale 🐳 LightGBM on dimension reduced dataset
- water buffalo 🐃 Exploring various dimension reduction techniques
- blowfish 🐡 bucketing row aggregations