Balanced sampling scheme #332

markusnagel · 2016-03-10T17:04:58Z

I created an iteration scheme that allows balanced sampling from different classes/groups/clusters. This is relevant for classification tasks where class imbalance is present (many real world problems). If for example a dataset is very skewed the learning algorithm might ignore the small classes (especially in early stage of training). Ways to overcome this are either weighting the underrepresented classes or to sample an equal amount of examples from each class.
In this iteration scheme we focus on the latter one and enable fuel to do such an equal sampling. It both allows subsampling (i.e. downsample the over represented class) or upsampling (i.e. sample more often the under represented class with replacement) . The amount of samples per class can be specified manually. This iteration scheme is not only applicable for classification, it can be used for any kind of groups which should be represented equally in the training set (e.g. results from clustering to avoid too similar examples in semi-supervised learning).

markusnagel added 4 commits March 3, 2016 15:44

First version of the balanced sampling scheme.

691dbea

Added test for balanced sampling scheme.

0fa6e1d

Small changes in the documentation according to comments in the PR.

abc3ac9

Minor change in the documentation.

e6f6aa2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Balanced sampling scheme #332

Balanced sampling scheme #332

markusnagel commented Mar 10, 2016

Balanced sampling scheme #332

Are you sure you want to change the base?

Balanced sampling scheme #332

Conversation

markusnagel commented Mar 10, 2016