Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image processing parallelity #12845

Closed
1 task
reisi007 opened this issue Dec 28, 2024 · 3 comments · Fixed by #12922
Closed
1 task

Image processing parallelity #12845

reisi007 opened this issue Dec 28, 2024 · 3 comments · Fixed by #12922
Labels
needs triage Issue needs to be triaged

Comments

@reisi007
Copy link

Astro Info

Astro                    v5.1.1
Node                     v22.12.0
System                   Windows (x64)
Package Manager          unknown
Output                   static
Adapter                  none
Integrations             @astrojs/mdx
                         @astrojs/sitemap
                         @astrojs/tailwind
                         @astrojs/solid-js

If this issue only occurs in one browser, which browser is a problem?

Build issue

Describe the Bug

Performance of image processing could be improved by making it more parallel. I am having a 16 core CPU and the CPU utilization is 25-40%. By the numbers of images I added, I would have expected better utilization of the CPU by processing more pictures in parallel

What's the expected result?

More load on the CPU when optimizing images by using all cores

Link to Minimal Reproducible Example


Participation

  • I am willing to submit a pull request for this issue.
@github-actions github-actions bot added the needs triage Issue needs to be triaged label Dec 28, 2024
@ascorbic
Copy link
Contributor

ascorbic commented Jan 7, 2025

It already uses a concurrency based on the CPU count. There may be other factors preventing full CPU utilisation. The most likely is probably disk IO, but as image generation is part of the build pipeline there are lots of other factors that might be affecting it. If you feel like digging in, there isn't any specific logging there about CPU utilisation, but you can have a look in the part of the code that I linked and maybe add debug logs to see if it's fully using your cores, and expeirment with manually setting different values and see if it improves performance. Node CPU counts can be unreliable, particularly when running in containerised environments, so it may be worth adding support for manually setting concurrency. If you decide to look into this I'd be interested in seeing your results, but for now I'm closing this as it's not really a bug.

@ascorbic ascorbic closed this as completed Jan 7, 2025
@adamchal
Copy link
Contributor

adamchal commented Jan 7, 2025

@ascorbic thanks for the insight. I’ve been wrestling with this and discovered the underlying issue. Just below your reference in core/build/generate.ts, there is an await in the for loop.

for (const [originalPath, transforms] of staticImageList) {
	await generateImagesForPath(originalPath, transforms, assetsCreationPipeline, queue);
}

Inside the loop generateImagesForPath adds to the queue, which is throttled using p-queue with a concurrency based on Node’s os.cpus().length. However, since there is an await, assets are never generated concurrently or in parallel.

In my test of roughly 2,000 image transforms, removing the await increased the asset generation by 16x: 25s instead of 400s. Coincidentally, my os.cpus().length is also 16.

I would recommend removing the await inside the for loop. I’m happy to create a pull request to remove the await if you agree:

for (const [originalPath, transforms] of staticImageList) {
	generateImagesForPath(originalPath, transforms, assetsCreationPipeline, queue);
}

Additionally, there is another await in assets/build/generate.ts. This is basically forcing the processing of each source image transforms to be serialized. I would need to study this more, but in a naive test on my repo of roughly 2,000 transforms of ~250 source images, removing this await only increased the asset generation by 8x (instead of 16x). My initial thinking of this drop in performance is likely because there is too much context switching between source images. However, in the (unlikely) scenario where you have a single source image and 2,000 transforms, those assets would be generated serially with no parallelization. Probably not a common scenario, but worth mentioning.

I think adding a configurable concurrency in astro.config.ts would be a nice-to-have. Something similar to the experimental build.concurrency, but maybe on the image or image.service property.

@ascorbic
Copy link
Contributor

ascorbic commented Jan 7, 2025

Well spotted! Removing that should be fine, because we await the idle queue below it. A PR to just remove that await would be a good start. The others would need more thought.

ascorbic added a commit that referenced this issue Jan 10, 2025
* fix(perf): do not `await` in `for` loop to enqueue asset transforms. Fixes #12845

* fix: avoid `queue.add()` during async for loop

* Update changeset

---------

Co-authored-by: Matt Kane <m@mk.gg>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs triage Issue needs to be triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants