Use multi-threading in cache_labels #3505

deanmark · 2021-06-07T12:55:50Z

Use multi-threading in cache_labels function. Saves time when loading large datasets.

🛠️ PR Summary

_{Made with ❤️ by Ultralytics Actions}

🌟 Summary

Improved threading for image processing and label verification.

📊 Key Changes

🧵 Added automatic detection of available CPU cores to optimize threading.
🔄 Switched image caching from a fixed 8-thread pool to variable threads based on CPU count.
📈 Refactored label scanning and verification process to utilize multiprocessing for improved efficiency.

🎯 Purpose & Impact

🚀 The purpose is to enhance performance by adapting resource usage to the system's capabilities, leading to faster data preprocessing.
🔍 The use of multiprocessing can significantly speed up label verification, resulting in quicker dataset preparation times.
👥 This change impacts users by providing more efficient data handling, especially those with powerful CPUs, without altering the core functionality.

github-actions

👋 Hello @deanmark, thank you for submitting a 🚀 PR! To allow your work to be integrated as seamlessly as possible, we advise you to:

✅ Verify your PR is up-to-date with origin/master. If your PR is behind origin/master an automatic GitHub actions rebase may be attempted by including the /rebase command in a comment body, or by running the following code, replacing 'feature' with the name of your local branch:

git remote add upstream https://github.com/ultralytics/yolov5.git
git fetch upstream
git checkout feature  # <----- replace 'feature' with local branch name
git rebase upstream/develop
git push -u origin -f

✅ Verify all Continuous Integration (CI) checks are passing.
✅ Reduce changes to the absolute minimum required for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." -Bruce Lee

deanmark · 2021-06-07T13:11:29Z

I just realized someone already uploaded a similar PR. I think my PR still has merit because the implementation is better. I use imap_unordered vs. starmap, which is a better fit to this use case. Additionally, the code is more concise.

glenn-jocher · 2021-06-07T13:13:13Z

@deanmark we have an existing PR for mutlithreaded label caching in #3385 by @vslaykovsky, though there are a few outstanding problems that we were not able to resolve in that PR:

tqdm progress bar did not present/appear correctly
Corrupted/problem labels were not displayed correctly, i.e.: "File xyx.jpg is corrupted..."
Speeds were slower on datasets smaller than about 5000 images, i.e. COCO128 caching speeds were reduced.

glenn-jocher · 2021-06-07T13:15:03Z

@deanmark one way to test your PR is to run COCO128 after corrupting one of the labels, i.e. add a 6th column to a row and you should see a report to screen identifying the problem image/label pair:

train: Scanning '../coco128/labels/train2017' images and labels... :   0%|          | 0/128 [00:00<?, ?it/s]
train: WARNING: Ignoring corrupted image and/or label ../coco128/images/train2017/000000000009.jpg: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (8,) + inhomogeneous part.
train: Scanning '../coco128/labels/train2017' images and labels... 128 found, 0 missing, 2 empty, 1 corrupted:  99%|█████████▉| 127/128 [00:00<00:00, 2370.38it/s]
train: New cache created: ../coco128/labels/train2017.cache

glenn-jocher · 2021-06-07T13:17:30Z

@deanmark on the surface of things I think imap_unordered should allow for proper tqdm progress bar and error handling integration that was not possible with starmap, so that's a good sign. Do you have any profiling results before and after?

glenn-jocher · 2021-06-07T15:41:31Z

@deanmark looks like tqdm pbar output is all good, including corruption display. Profiling results on n1-standard-8 GCP instance:

Current results

train: Scanning '../coco/train2017' images and labels... 117266 found, 1021 missing, 0 empty, 0 corrupted: 100%|████████████████████████████████████████| 118287/118287 [01:21<00:00, 1443.50it/s]
train: New cache created: ../coco/train2017.cache
val: Scanning '../coco/val2017' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupted: 100%|████████████████████████████████████████████████████| 5000/5000 [00:02<00:00, 1911.45it/s]
val: New cache created: ../coco/val2017.cache

PR results

train: Scanning '../coco/train2017' images and labels... 117266 found, 1021 missing, 0 empty, 0 corrupted: 100%|█████████████████████████████████████████| 118287/118287 [02:07<00:00, 930.79it/s]
train: New cache created: ../coco/train2017.cache
val: Scanning '../coco/val2017' images and labels... 4952 found, 48 missing, 0 empty, 0 corrupted: 100%|████████████████████████████████████████████████████| 5000/5000 [00:04<00:00, 1035.35it/s]
val: New cache created: ../coco/val2017.cache

Hmm, well unfortunately the PR took longer than the current code to cache COCO on our GCP instance. This might actually coincide with the results from my earlier experiments in #3385 (comment) that showed a slowdown vs default with ThreadPool.imap(8)

Perhaps until we can get a better solution we can update this PR for normal for-loop operation, still using the refactored function you created to allow for users to do easier multiprocessing experiments going forward.

deanmark · 2021-06-07T15:50:38Z

tqdm progress bar did not present/appear correctly

Corrupted/problem labels were not displayed correctly, i.e.: "File xyx.jpg is corrupted..."

Speeds were slower on datasets smaller than about 5000 images, i.e. COCO128 caching speeds were reduced.

The issues you raised work properly in this PR. Provided are some profiling results:

	COCO128	VOC train 16.5k	VOC val 4.9k
default	1.5 s	227.4 s	75.4 s
ThreadPool(8).imap_unordered	1.03 s	138.4 s	50.3 s
ThreadPool(4).imap_unordered	1.01 s	123 s	49.2 s
ThreadPool(2).imap_unordered	1.02 s	144.7 s	50.6 s
ThreadPool(4).imap	1 s	125.3 s	48.74 s
Pool(4).map	1.27 s	127.5 s	50.6 s
Pool(4).imap_unordered	1.25 s	126.3 s	48.88 s

The processor used was a Xeon E5-2690, with the images and labels residing on a network drive. Interestingly, the best results are achieved using 4 threads. I also added several other methods to the comparison: ThreadPool.imap, Pool.map and Pool.imap_unordered. All these methods work more or less the same on my system.

deanmark · 2021-06-07T16:08:07Z

@glenn-jocher if Pool(8).starmap worked well on your setup, maybe try using Pool.imap/imap_unordered.
Also, I would try reducing the number of threads. My cpu has cpu_count 16, but the best results were obtained using threads=4

glenn-jocher · 2021-06-07T16:09:48Z

@deanmark hmm interesting. The network is likely your bottleneck then. We always recommend training with local data, not a mounted bucket or network drive. If I repeat with VOC on the same n1-standard-8 instance (datasets are on a 500GB SSD) for VOC I get these:

Current VOC (5s)

train: Scanning '../VOC/labels/train' images and labels... 16551 found, 0 missing, 0 empty, 0 corrupted: 100%|██████████| 16551/16551 [00:05<00:00, 3264.42it/s]
train: New cache created: ../VOC/labels/train.cache
val: Scanning '../VOC/labels/val' images and labels... 4952 found, 0 missing, 0 empty, 0 corrupted: 100%|█████████████████| 4952/4952 [00:02<00:00, 2331.41it/s]
val: New cache created: ../VOC/labels/val.cache

PR VOC ThreadPool(8).imap_unordered (15s)

train: Scanning '../VOC/labels/train' images and labels... 16551 found, 0 missing, 0 empty, 0 corrupted: 100%|██████████| 16551/16551 [00:15<00:00, 1046.35it/s]
train: New cache created: ../VOC/labels/train.cache
val: Scanning '../VOC/labels/val' images and labels... 4952 found, 0 missing, 0 empty, 0 corrupted: 100%|█████████████████| 4952/4952 [00:04<00:00, 1201.11it/s]
val: New cache created: ../VOC/labels/val.cache

PR VOC Pool(8).imap_unordered (1s) 🚀

train: Scanning '../VOC/labels/train' images and labels... 16551 found, 0 missing, 0 empty, 0 corrupted: 100%|███████████| 16551/16551 [00:01<00:00, 9023.54it/s]
train: New cache created: ../VOC/labels/train.cache
val: Scanning '../VOC/labels/val' images and labels... 4952 found, 0 missing, 0 empty, 0 corrupted: 100%|█████████████████| 4952/4952 [00:01<00:00, 4321.09it/s]
val: New cache created: ../VOC/labels/val.cache

glenn-jocher · 2021-06-07T16:11:33Z

@deanmark the starmap method used in #3385 does result in speed increases over current code for large datasets like VOC and COCO, but it does not play well with tqdm and progress bar outputs, so without additional PR work there the user would be blind to corrupted image reports.

deanmark · 2021-06-07T16:35:58Z

@glenn-jocher I moved the VOC dataset locally, and could recreate your timings.

	VOC train 16.5k	VOC val 4.9k
default	5.75 s	1.83 s
Pool(8).imap_unordered	2.58 s	1.6 s
ThreadPool(8).imap_unordered	10.4 s	3.37 s

These results were obtained using the same Xeon E5-2690 processor, but this time the data was local.
I'll upload my new code using Pool.imap_unordered, this should improve upon baseline results.

refactor initial desc

glenn-jocher · 2021-06-08T16:01:30Z

@deanmark I've tested your latest updates and the speeds are much improved! Results on VOC now show about 3x speedup vs current default in #3505 (comment)

PR is merged. Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

deanmark · 2021-06-08T16:10:17Z

@glenn-jocher My pleasure! Keep up the good work with this amazing code.

Minor updates to #3505, inplace accumulation.

* Use multi threading in cache_labels * PEP8 reformat * Add num_threads * changed ThreadPool.imap_unordered to Pool.imap_unordered * Remove inplace additions * Update datasets.py refactor initial desc Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com> (cherry picked from commit 28bff22)

Minor updates to ultralytics#3505, inplace accumulation. (cherry picked from commit 8d52c1c)

* Use multi threading in cache_labels * PEP8 reformat * Add num_threads * changed ThreadPool.imap_unordered to Pool.imap_unordered * Remove inplace additions * Update datasets.py refactor initial desc Co-authored-by: Glenn Jocher <glenn.jocher@ultralytics.com>

Minor updates to ultralytics#3505, inplace accumulation.

Minor updates to ultralytics/yolov5#3505, inplace accumulation.

Use multi threading in cache_labels

f8774dc

github-actions bot reviewed Jun 7, 2021

View reviewed changes

glenn-jocher added 2 commits June 7, 2021 15:19

PEP8 reformat

0313918

Add num_threads

1fe9db6

changed ThreadPool.imap_unordered to Pool.imap_unordered

84a2586

glenn-jocher closed this Jun 8, 2021

glenn-jocher deleted the branch ultralytics:develop June 8, 2021 08:22

glenn-jocher reopened this Jun 8, 2021

glenn-jocher added 2 commits June 8, 2021 17:52

Remove inplace additions

b79a0e2

Update datasets.py

1e0c282

refactor initial desc

glenn-jocher merged commit 28bff22 into ultralytics:develop Jun 8, 2021

glenn-jocher added a commit that referenced this pull request Jun 8, 2021

Update datasets.py

2cad196

Minor updates to #3505, inplace accumulation.

glenn-jocher mentioned this pull request Jun 8, 2021

Update datasets.py #3531

Merged

glenn-jocher added a commit that referenced this pull request Jun 8, 2021

Update datasets.py (#3531)

8d52c1c

Minor updates to #3505, inplace accumulation.

Lechtr pushed a commit to Lechtr/yolov5 that referenced this pull request Jul 20, 2021

Update datasets.py (ultralytics#3531)

efa6285

Minor updates to ultralytics#3505, inplace accumulation. (cherry picked from commit 8d52c1c)

glenn-jocher mentioned this pull request Oct 12, 2021

YOLOv5 release v6.0 #5141

Merged

glenn-jocher mentioned this pull request Nov 7, 2021

YOLOv5 v6.0 compatibility update (draft) ultralytics/yolov3#1855

Closed

glenn-jocher mentioned this pull request Nov 14, 2021

YOLOv5 v6.0 compatibility update ultralytics/yolov3#1857

Merged

BjarneKuehl pushed a commit to fhkiel-mlaip/yolov5 that referenced this pull request Aug 26, 2022

Update datasets.py (ultralytics#3531)

0e9437d

Minor updates to ultralytics#3505, inplace accumulation.

SecretStar112 added a commit to SecretStar112/yolov5 that referenced this pull request May 24, 2023

Update datasets.py (#3531)

ba7fcbe

Minor updates to ultralytics/yolov5#3505, inplace accumulation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use multi-threading in cache_labels #3505

Use multi-threading in cache_labels #3505

deanmark commented Jun 7, 2021 •

edited by UltralyticsAssistant

Loading

github-actions bot left a comment

deanmark commented Jun 7, 2021

glenn-jocher commented Jun 7, 2021 •

edited

Loading

glenn-jocher commented Jun 7, 2021

glenn-jocher commented Jun 7, 2021

glenn-jocher commented Jun 7, 2021

deanmark commented Jun 7, 2021

deanmark commented Jun 7, 2021

glenn-jocher commented Jun 7, 2021 •

edited

Loading

glenn-jocher commented Jun 7, 2021

deanmark commented Jun 7, 2021

glenn-jocher commented Jun 8, 2021

deanmark commented Jun 8, 2021

Use multi-threading in cache_labels #3505

Use multi-threading in cache_labels #3505

Conversation

deanmark commented Jun 7, 2021 • edited by UltralyticsAssistant Loading

🛠️ PR Summary

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

github-actions bot left a comment

Choose a reason for hiding this comment

deanmark commented Jun 7, 2021

glenn-jocher commented Jun 7, 2021 • edited Loading

glenn-jocher commented Jun 7, 2021

glenn-jocher commented Jun 7, 2021

glenn-jocher commented Jun 7, 2021

Current results

PR results

deanmark commented Jun 7, 2021

deanmark commented Jun 7, 2021

glenn-jocher commented Jun 7, 2021 • edited Loading

Current VOC (5s)

PR VOC ThreadPool(8).imap_unordered (15s)

PR VOC Pool(8).imap_unordered (1s) 🚀

glenn-jocher commented Jun 7, 2021

deanmark commented Jun 7, 2021

glenn-jocher commented Jun 8, 2021

deanmark commented Jun 8, 2021

deanmark commented Jun 7, 2021 •

edited by UltralyticsAssistant

Loading

glenn-jocher commented Jun 7, 2021 •

edited

Loading

glenn-jocher commented Jun 7, 2021 •

edited

Loading