Add Weighted Sampler for highly imbalanced datasets #8766

pourmand1376 · 2022-07-28T15:12:13Z

Related Issues:

This PR adds weighted sampler for datasets with highly imbalanced data. Idea is taken from here.

As you know in medical images, the data is mostly highly imbalanced and there is nothing we can do to increase data. If you train yolo using default sampler, you would get 0,0,0 for Precision, Recall and mAP. This is why this is a must for certain datasets.

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Added Weighted Sampler option for handling imbalanced datasets in YOLOv5 training.

📊 Key Changes

Integrated Weighted Random Sampler within the training process.
Added weighted_sampler as a boolean argument in the training script to enable the use of the sampler.
Created a utility function to initialize the Weighted Random Sampler based on label distribution.
Ensured the Weighted Sampler is not used during validation for accurate results.

🎯 Purpose & Impact

🎯 Purpose: To improve model performance on imbalanced datasets where some classes appear much more frequently than others.
💡 Benefits:
- Helps to prevent model bias towards more frequent classes.
- Aimed at boosting the accuracy for rare classes.
⚠️ Potential Impact:
- Users training models on imbalanced datasets may see better performance.
- The Weighted Sampler feature is not compatible with multi-GPU training setups at the moment.

for more information, see https://pre-commit.ci

AyushExel · 2022-07-28T15:27:05Z

@pourmand1376 hey this is a great idea. I've seen this being used for other tasks like segmentation. Do you have a before/after study of the effect of this change on any dataset?

pourmand1376 · 2022-07-28T15:41:55Z

@AyushExel
I am working on a paper which studies this effect on a custom dataset. In the meantime, I will do an analysis on a public dataset and report here. Stay tuned.

glenn-jocher · 2022-07-29T14:21:34Z

@pourmand1376 very cool! Does this overlap or complement the train.py --image-weights argument which also introduces weighted sampling based on image contents and the previous epoch's per-class validation results?

yolov5/train.py

Line 466 in e309a85

    
           parser.add_argument('--image-weights', action='store_true', help='use weighted image selection for training')

pourmand1376 · 2022-07-31T21:58:25Z

@glenn-jocher, I found no documentation for this option. I do not know what it does. Do we have any documentation about how to test it?

From the code, it seems that it does nothing serious. Am I right?

for more information, see https://pre-commit.ci

triple-Mu · 2022-08-16T03:29:46Z

I test in coco128, there are something wrong in unique_classes, counts = np.unique(labels_per_class, return_counts=True)

pourmand1376 · 2022-08-16T09:13:07Z

@triple-Mu Thanks for testing!

Can you explain more? How did you understand something is wrong?
Do you have any errors?

pourmand1376 · 2022-08-16T09:18:34Z

I found this public dataset to test my imbalanced sampling strategy. Although I have tested this on my custom dataset, testing should be done on public datasets to make it reproducible.

I will do an analysis soon.

…into add_sampler

for more information, see https://pre-commit.ci

github-actions · 2023-03-22T00:20:50Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions YOLOv5 🚀 and Vision AI ⭐.

Signed-off-by: Amir Pourmand <pourmand1376@gmail.com>

for more information, see https://pre-commit.ci

github-actions · 2023-10-03T00:21:09Z

👋 Hello there! We wanted to let you know that we've decided to close this pull request due to inactivity. We appreciate the effort you put into contributing to our project, but unfortunately, not all contributions are suitable or aligned with our product roadmap.

We hope you understand our decision, and please don't let it discourage you from contributing to open source projects in the future. We value all of our community members and their contributions, and we encourage you to keep exploring new projects and ways to get involved.

For additional resources and information, please see the links below:

Docs: https://docs.ultralytics.com
HUB: https://hub.ultralytics.com
Community: https://community.ultralytics.com

Thank you for your contributions to YOLO 🚀 and Vision AI ⭐

glenn-jocher · 2024-01-28T03:03:48Z

You're welcome, @triple-Mu! If you encounter any more issues or have further questions, feel free to reach out. Happy training! 😊🚀

Ares-01 · 2024-04-25T15:28:05Z

utils/dataloaders.py

+            label_classes = np.unique(label[:, 0]).tolist()
+            values = []
+            for cls_ in label_classes:
+                values.append(weight_dict[_cls])


Here shouldn't it be [cls_]? I assume this is a typo. Maybe worth noticing.

Yep. Well noticed.

But this is not merged and it is not going to be since yolov5 is not maintained that much anymore.

pourmand1376 added 5 commits July 28, 2022 17:10

add sampler

75e3d76

add flag

0284aaf

rank

bdeeb2c

assert

8e47fd7

add weighted sampler

1f2b073

pourmand1376 changed the title ~~Add Weighted Sampler~~ Add Weighted Sampler for highly imbalanced datasets Jul 28, 2022

pre-commit-ci bot and others added 2 commits July 28, 2022 15:13

[pre-commit.ci] auto fixes from pre-commit.com hooks

c78ebaa

for more information, see https://pre-commit.ci

fix bug

08dfe8b

pourmand1376 and others added 5 commits July 29, 2022 10:24

remove normalized count

0d73fb8

add validation check

fa157e7

remove comment

c577a9c

Merge branch 'master' into add_sampler

0ebde56

Merge branch 'master' into add_sampler

475aeed

pourmand1376 and others added 3 commits August 1, 2022 02:30

Merge branch 'master' into add_sampler

1dc1c51

[pre-commit.ci] auto fixes from pre-commit.com hooks

0201b44

for more information, see https://pre-commit.ci

Merge branch 'master' into add_sampler

9929f04

pourmand1376 mentioned this pull request Aug 2, 2022

Add Pre-commit Hook alshedivat/al-folio#801

Merged

pourmand1376 force-pushed the add_sampler branch from cd14353 to 9929f04 Compare August 2, 2022 17:11

pourmand1376 and others added 4 commits August 2, 2022 21:43

Merge branch 'master' into add_sampler

2f07c9c

[pre-commit.ci] auto fixes from pre-commit.com hooks

d19b3ec

for more information, see https://pre-commit.ci

Merge branch 'master' into add_sampler

c4dc924

Merge branch 'master' into add_sampler

2385e15

Merge branch 'master' into add_sampler

9e5fb7c

pourmand1376 and others added 10 commits August 21, 2022 11:56

Merge branch 'master' into add_sampler

ef34b5c

.

7f4e58d

Merge branch 'add_sampler' of https://github.com/pourmand1376/yolov5 …

65ef280

…into add_sampler

[pre-commit.ci] auto fixes from pre-commit.com hooks

a85e330

for more information, see https://pre-commit.ci

Merge branch 'master' into add_sampler

950bcb6

add keyword arguments

e7fe107

Merge branch 'master' into add_sampler

469636f

Merge branch 'master' into add_sampler

97eb137

Merge branch 'master' into add_sampler

5d0b07b

Merge branch 'master' into add_sampler

2d4a541

pourmand1376 marked this pull request as ready for review August 30, 2022 09:52

pourmand1376 added 6 commits August 30, 2022 21:02

Merge branch 'master' into add_sampler

547bc31

Merge branch 'master' into add_sampler

1878699

Merge branch 'master' into add_sampler

b6ace34

Merge branch 'master' into add_sampler

9a10a0d

Merge branch 'master' into add_sampler

a354b3c

Merge branch 'master' into add_sampler

b846905

github-actions bot added the Stale label Mar 22, 2023

github-actions bot removed the Stale label Apr 10, 2023

pourmand1376 and others added 4 commits May 1, 2023 18:34

Merge branch 'master' into add_sampler

7bfb43c

Signed-off-by: Amir Pourmand <pourmand1376@gmail.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

75e823c

for more information, see https://pre-commit.ci

Update train.py

82eaa60

fix comman

f18d3cb

pourmand1376 mentioned this pull request May 1, 2023

Custom dataset with imbalanced classes. #492

Closed

github-actions bot added the Stale label Oct 3, 2023

github-actions bot closed this Nov 3, 2023

Ares-01 reviewed Apr 25, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Weighted Sampler for highly imbalanced datasets #8766

Add Weighted Sampler for highly imbalanced datasets #8766

pourmand1376 commented Jul 28, 2022 •

edited by UltralyticsAssistant

Loading

AyushExel commented Jul 28, 2022

pourmand1376 commented Jul 28, 2022

glenn-jocher commented Jul 29, 2022

pourmand1376 commented Jul 31, 2022 •

edited

Loading

triple-Mu commented Aug 16, 2022

pourmand1376 commented Aug 16, 2022

pourmand1376 commented Aug 16, 2022

github-actions bot commented Mar 22, 2023

github-actions bot commented Oct 3, 2023

glenn-jocher commented Jan 28, 2024

Ares-01 Apr 25, 2024

pourmand1376 Apr 25, 2024

Add Weighted Sampler for highly imbalanced datasets #8766

Add Weighted Sampler for highly imbalanced datasets #8766

Conversation

pourmand1376 commented Jul 28, 2022 • edited by UltralyticsAssistant Loading

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

AyushExel commented Jul 28, 2022

pourmand1376 commented Jul 28, 2022

glenn-jocher commented Jul 29, 2022

pourmand1376 commented Jul 31, 2022 • edited Loading

triple-Mu commented Aug 16, 2022

pourmand1376 commented Aug 16, 2022

pourmand1376 commented Aug 16, 2022

github-actions bot commented Mar 22, 2023

github-actions bot commented Oct 3, 2023

glenn-jocher commented Jan 28, 2024

Ares-01 Apr 25, 2024

Choose a reason for hiding this comment

pourmand1376 Apr 25, 2024

Choose a reason for hiding this comment

pourmand1376 commented Jul 28, 2022 •

edited by UltralyticsAssistant

Loading

pourmand1376 commented Jul 31, 2022 •

edited

Loading