Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Balanced sampling scheme #332

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

markusnagel
Copy link

I created an iteration scheme that allows balanced sampling from different classes/groups/clusters. This is relevant for classification tasks where class imbalance is present (many real world problems). If for example a dataset is very skewed the learning algorithm might ignore the small classes (especially in early stage of training). Ways to overcome this are either weighting the underrepresented classes or to sample an equal amount of examples from each class.
In this iteration scheme we focus on the latter one and enable fuel to do such an equal sampling. It both allows subsampling (i.e. downsample the over represented class) or upsampling (i.e. sample more often the under represented class with replacement) . The amount of samples per class can be specified manually. This iteration scheme is not only applicable for classification, it can be used for any kind of groups which should be represented equally in the training set (e.g. results from clustering to avoid too similar examples in semi-supervised learning).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant