Skip to content

Imbalanced data commonly exist in real world, especially in anomaly-detection tasks. Handling imbalanced data is important to the tasks, otherwise the predictions are biased towards the majority class. RandomUnderSampler, ClusterCentroids, CondensedNearestNeighbour, and etc. are useful undersampling tools to remove data for majority classes.

Notifications You must be signed in to change notification settings

hanfei1986/Undersampling-of-imbalanced-data-with-RandomUnderSampler-and-others

Repository files navigation

Undersampling-of-imbalanced-data-with-RandomUnderSampler-and-others

Imbalanced data commonly exist in real world, especially in anomaly-detection tasks. Handling imbalanced data is important to the tasks, otherwise the predictions are biased towards the majority class. RandomUnderSampler, ClusterCentroids, CondensedNearestNeighbour, and etc. are useful undersampling tools to remove data for majority classes.

The data for the "Poor" and "Good" classes are much less than the "Standard" class:

image

The predictions are biased towards the majority class:

image

Undersampling with RandomUnderSampler:

image

The predictions get more balanced:

image

About

Imbalanced data commonly exist in real world, especially in anomaly-detection tasks. Handling imbalanced data is important to the tasks, otherwise the predictions are biased towards the majority class. RandomUnderSampler, ClusterCentroids, CondensedNearestNeighbour, and etc. are useful undersampling tools to remove data for majority classes.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published