Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use multi-threading in cache_labels #3505

Merged

Conversation

deanmark
Copy link
Contributor

@deanmark deanmark commented Jun 7, 2021

Use multi-threading in cache_labels function. Saves time when loading large datasets.

🛠️ PR Summary

Made with ❤️ by Ultralytics Actions

🌟 Summary

Improved threading for image processing and label verification.

📊 Key Changes

  • 🧵 Added automatic detection of available CPU cores to optimize threading.
  • 🔄 Switched image caching from a fixed 8-thread pool to variable threads based on CPU count.
  • 📈 Refactored label scanning and verification process to utilize multiprocessing for improved efficiency.

🎯 Purpose & Impact

  • 🚀 The purpose is to enhance performance by adapting resource usage to the system's capabilities, leading to faster data preprocessing.
  • 🔍 The use of multiprocessing can significantly speed up label verification, resulting in quicker dataset preparation times.
  • 👥 This change impacts users by providing more efficient data handling, especially those with powerful CPUs, without altering the core functionality.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 Hello @deanmark, thank you for submitting a 🚀 PR! To allow your work to be integrated as seamlessly as possible, we advise you to:

  • ✅ Verify your PR is up-to-date with origin/master. If your PR is behind origin/master an automatic GitHub actions rebase may be attempted by including the /rebase command in a comment body, or by running the following code, replacing 'feature' with the name of your local branch:
git remote add upstream https://github.com/ultralytics/yolov5.git
git fetch upstream
git checkout feature  # <----- replace 'feature' with local branch name
git rebase upstream/develop
git push -u origin -f
  • ✅ Verify all Continuous Integration (CI) checks are passing.
  • ✅ Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." -Bruce Lee

@deanmark
Copy link
Contributor Author

deanmark commented Jun 7, 2021

I just realized someone already uploaded a similar PR. I think my PR still has merit because the implementation is better. I use imap_unordered vs. starmap, which is a better fit to this use case. Additionally, the code is more concise.

@glenn-jocher
Copy link
Member

glenn-jocher commented Jun 7, 2021

@deanmark we have an existing PR for mutlithreaded label caching in #3385 by @vslaykovsky, though there are a few outstanding problems that we were not able to resolve in that PR:

  1. tqdm progress bar did not present/appear correctly
  2. Corrupted/problem labels were not displayed correctly, i.e.: "File xyx.jpg is corrupted..."
  3. Speeds were slower on datasets smaller than about 5000 images, i.e. COCO128 caching speeds were reduced.

@glenn-jocher
Copy link
Member

@deanmark one way to test your PR is to run COCO128 after corrupting one of the labels, i.e. add a 6th column to a row and you should see a report to screen identifying the problem image/label pair:

train: Scanning '../coco128/labels/train2017' images and labels... :   0%|          | 0/128 [00:00<?, ?it/s]
train: WARNING: Ignoring corrupted image and/or label ../coco128/images/train2017/000000000009.jpg: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (8,) + inhomogeneous part.
train: Scanning '../coco128/labels/train2017' images and labels... 128 found, 0 missing, 2 empty, 1 corrupted:  99%|█████████▉| 127/128 [00:00<00:00, 2370.38it/s]
train: New cache created: ../coco128/labels/train2017.cache

@glenn-jocher
Copy link
Member

@deanmark on the surface of things I think imap_unordered should allow for proper tqdm progress bar and error handling integration that was not possible with starmap, so that's a good sign. Do you have any profiling results before and after?

@glenn-jocher
Copy link
Member

@deanmark looks like tqdm pbar output is all good, including corruption display. Profiling results on n1-standard-8 GCP instance:
Screenshot 2021-06-07 at 17 12 23

Current results

train: Scanning '../coco/train2017' images and labels... 117266 found, 1021 missing, 0 empty, 0 corrupted: 100%|████████████████████████████████████████| 118287/118287 [01:21<00:00, 1443.50it/s]
train: New cache created: ../coco/train2017.cache
val: Scanning '../coco/val2017' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupted: 100%|████████████████████████████████████████████████████| 5000/5000 [00:02<00:00, 1911.45it/s]
val: New cache created: ../coco/val2017.cache

PR results

train: Scanning '../coco/train2017' images and labels... 117266 found, 1021 missing, 0 empty, 0 corrupted: 100%|█████████████████████████████████████████| 118287/118287 [02:07<00:00, 930.79it/s]
train: New cache created: ../coco/train2017.cache
val: Scanning '../coco/val2017' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupted: 100%|████████████████████████████████████████████████████| 5000/5000 [00:04<00:00, 1035.35it/s]
val: New cache created: ../coco/val2017.cache

Hmm, well unfortunately the PR took longer than the current code to cache COCO on our GCP instance. This might actually coincide with the results from my earlier experiments in #3385 (comment) that showed a slowdown vs default with ThreadPool.imap(8)

Perhaps until we can get a better solution we can update this PR for normal for-loop operation, still using the refactored function you created to allow for users to do easier multiprocessing experiments going forward.

@deanmark
Copy link
Contributor Author

deanmark commented Jun 7, 2021

  1. tqdm progress bar did not present/appear correctly
  2. Corrupted/problem labels were not displayed correctly, i.e.: "File xyx.jpg is corrupted..."
  3. Speeds were slower on datasets smaller than about 5000 images, i.e. COCO128 caching speeds were reduced.

The issues you raised work properly in this PR. Provided are some profiling results:

COCO128 VOC train 16.5k VOC val 4.9k
default 1.5 s 227.4 s 75.4 s
ThreadPool(8).imap_unordered 1.03 s 138.4 s 50.3 s
ThreadPool(4).imap_unordered 1.01 s 123 s 49.2 s
ThreadPool(2).imap_unordered 1.02 s 144.7 s 50.6 s
ThreadPool(4).imap 1 s 125.3 s 48.74 s
Pool(4).map 1.27 s 127.5 s 50.6 s
Pool(4).imap_unordered 1.25 s 126.3 s 48.88 s

The processor used was a Xeon E5-2690, with the images and labels residing on a network drive. Interestingly, the best results are achieved using 4 threads. I also added several other methods to the comparison: ThreadPool.imap, Pool.map and Pool.imap_unordered. All these methods work more or less the same on my system.

@deanmark
Copy link
Contributor Author

deanmark commented Jun 7, 2021

@glenn-jocher if Pool(8).starmap worked well on your setup, maybe try using Pool.imap/imap_unordered.
Also, I would try reducing the number of threads. My cpu has cpu_count 16, but the best results were obtained using threads=4

@glenn-jocher
Copy link
Member

glenn-jocher commented Jun 7, 2021

@deanmark hmm interesting. The network is likely your bottleneck then. We always recommend training with local data, not a mounted bucket or network drive. If I repeat with VOC on the same n1-standard-8 instance (datasets are on a 500GB SSD) for VOC I get these:

Current VOC (5s)

train: Scanning '../VOC/labels/train' images and labels... 16551 found, 0 missing, 0 empty, 0 corrupted: 100%|██████████| 16551/16551 [00:05<00:00, 3264.42it/s]
train: New cache created: ../VOC/labels/train.cache
val: Scanning '../VOC/labels/val' images and labels... 4952 found, 0 missing, 0 empty, 0 corrupted: 100%|█████████████████| 4952/4952 [00:02<00:00, 2331.41it/s]
val: New cache created: ../VOC/labels/val.cache

PR VOC ThreadPool(8).imap_unordered (15s)

train: Scanning '../VOC/labels/train' images and labels... 16551 found, 0 missing, 0 empty, 0 corrupted: 100%|██████████| 16551/16551 [00:15<00:00, 1046.35it/s]
train: New cache created: ../VOC/labels/train.cache
val: Scanning '../VOC/labels/val' images and labels... 4952 found, 0 missing, 0 empty, 0 corrupted: 100%|█████████████████| 4952/4952 [00:04<00:00, 1201.11it/s]
val: New cache created: ../VOC/labels/val.cache

PR VOC Pool(8).imap_unordered (1s) 🚀

train: Scanning '../VOC/labels/train' images and labels... 16551 found, 0 missing, 0 empty, 0 corrupted: 100%|███████████| 16551/16551 [00:01<00:00, 9023.54it/s]
train: New cache created: ../VOC/labels/train.cache
val: Scanning '../VOC/labels/val' images and labels... 4952 found, 0 missing, 0 empty, 0 corrupted: 100%|█████████████████| 4952/4952 [00:01<00:00, 4321.09it/s]
val: New cache created: ../VOC/labels/val.cache

@glenn-jocher
Copy link
Member

@deanmark the starmap method used in #3385 does result in speed increases over current code for large datasets like VOC and COCO, but it does not play well with tqdm and progress bar outputs, so without additional PR work there the user would be blind to corrupted image reports.

@deanmark
Copy link
Contributor Author

deanmark commented Jun 7, 2021

@glenn-jocher I moved the VOC dataset locally, and could recreate your timings.

VOC train 16.5k VOC val 4.9k
default 5.75 s 1.83 s
Pool(8).imap_unordered 2.58 s 1.6 s
ThreadPool(8).imap_unordered 10.4 s 3.37 s

These results were obtained using the same Xeon E5-2690 processor, but this time the data was local.
I'll upload my new code using Pool.imap_unordered, this should improve upon baseline results.

@glenn-jocher glenn-jocher deleted the branch ultralytics:develop June 8, 2021 08:22
@glenn-jocher glenn-jocher reopened this Jun 8, 2021
@glenn-jocher glenn-jocher merged commit 28bff22 into ultralytics:develop Jun 8, 2021
@glenn-jocher
Copy link
Member

@deanmark I've tested your latest updates and the speeds are much improved! Results on VOC now show about 3x speedup vs current default in #3505 (comment)

PR is merged. Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

@deanmark
Copy link
Contributor Author

deanmark commented Jun 8, 2021

@glenn-jocher My pleasure! Keep up the good work with this amazing code.

glenn-jocher added a commit that referenced this pull request Jun 8, 2021
Minor updates to #3505, inplace accumulation.
@glenn-jocher glenn-jocher mentioned this pull request Jun 8, 2021
glenn-jocher added a commit that referenced this pull request Jun 8, 2021
Minor updates to #3505, inplace accumulation.
Lechtr pushed a commit to Lechtr/yolov5 that referenced this pull request Jul 20, 2021
* Use multi threading in cache_labels

* PEP8 reformat

* Add num_threads

* changed ThreadPool.imap_unordered to Pool.imap_unordered

* Remove inplace additions

* Update datasets.py

refactor initial desc

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
(cherry picked from commit 28bff22)
Lechtr pushed a commit to Lechtr/yolov5 that referenced this pull request Jul 20, 2021
Minor updates to ultralytics#3505, inplace accumulation.

(cherry picked from commit 8d52c1c)
BjarneKuehl pushed a commit to fhkiel-mlaip/yolov5 that referenced this pull request Aug 26, 2022
* Use multi threading in cache_labels

* PEP8 reformat

* Add num_threads

* changed ThreadPool.imap_unordered to Pool.imap_unordered

* Remove inplace additions

* Update datasets.py

refactor initial desc

Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>
BjarneKuehl pushed a commit to fhkiel-mlaip/yolov5 that referenced this pull request Aug 26, 2022
Minor updates to ultralytics#3505, inplace accumulation.
SecretStar112 added a commit to SecretStar112/yolov5 that referenced this pull request May 24, 2023
Minor updates to ultralytics/yolov5#3505, inplace accumulation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants