Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I created an iteration scheme that allows balanced sampling from different classes/groups/clusters. This is relevant for classification tasks where class imbalance is present (many real world problems). If for example a dataset is very skewed the learning algorithm might ignore the small classes (especially in early stage of training). Ways to overcome this are either weighting the underrepresented classes or to sample an equal amount of examples from each class.
In this iteration scheme we focus on the latter one and enable fuel to do such an equal sampling. It both allows subsampling (i.e. downsample the over represented class) or upsampling (i.e. sample more often the under represented class with replacement) . The amount of samples per class can be specified manually. This iteration scheme is not only applicable for classification, it can be used for any kind of groups which should be represented equally in the training set (e.g. results from clustering to avoid too similar examples in semi-supervised learning).