Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extremely high CPU use by ffmpeg #75

Closed
kevinmcmurtrie opened this issue Nov 21, 2023 · 13 comments · Fixed by #87
Closed

Extremely high CPU use by ffmpeg #75

kevinmcmurtrie opened this issue Nov 21, 2023 · 13 comments · Fixed by #87
Assignees
Labels
bug Something isn't working
Milestone

Comments

@kevinmcmurtrie
Copy link

MPEG4 to WebM transcoding is extremely slow. https://farm.openzim.org/pipeline/c06a8148-7d9a-422c-b5b4-abfe93d51168 has been crawling along for two weeks while using 100% of all CPUs.

@kevinmcmurtrie
Copy link
Author

Notes:
I can't reproduce this slowness outside of the docker container.

  1. There may be conflicting FFmpeg options that are causing excessive backtracking and retries. This sets both min and max bitrates and min and max quality rates. It's possible that this causes some backtracking.

'ffmpeg', '-y', '-i', 'file:/output/tmpmf3sut4d/afe1380b936363857ce244eb5eda4019.mp4', '-max_muxing_queue_size', '9999', '-codec:v', 'libvpx', '-quality', 'best', '-b:v', '300k', '-maxrate', '300k', '-minrate', '300k', '-qmin', '30', '-qmax', '42', '-vf', "scale='480:trunc(ow/a/2)*2'", '-codec:a', 'libvorbis', '-ar', '44100', '-b:a', '48k', 'file:/tmp/tmph5ib_ryv/video.tmp.webm'

Something simpler may help. This targets a bitrate of 300kb within a 512kb window and gives a wider quality range:
'ffmpeg', '-y', '-i', 'file:/output/tmpmf3sut4d/afe1380b936363857ce244eb5eda4019.mp4', '-max_muxing_queue_size', '9999', '-codec:v', 'libvpx', '-quality', 'best', '-b:v', '300k', '-bufsize', '512k', '-qmin', '20', '-vf', "scale='480:trunc(ow/a/2)*2'", '-codec:a', 'libvorbis', '-ar', '44100', '-b:a', '48k', 'file:/tmp/tmph5ib_ryv/video.tmp.webm'

At least for me, the second one is generates faster, looks better, and consumes about 1/3 the bandwidth. There's no minimum bitrate and no maximum q so it can fly past all of those motionless whiteboard images.

  1. The FFmpeg encoder may be old?

@benoit74
Copy link
Collaborator

Thank you for reporting this and doing some tests. I will have a look into it.

@benoit74
Copy link
Collaborator

Regarding versions, image ghcr.io/openzim/kolibri:1.1.0 is using:

  • ffmpeg version 5.1.3-1 Copyright (c) 2000-2022 the FFmpeg developers
  • for libvpx encoder, libvpx7 apt package version 1.11.0-2ubuntu2.2. libvpx 1.11.0 has been originally released (binary form) on Oct 7, 2021. As of today, it is the latest version on Ubuntu LTS. Probably not a big deal

@benoit74
Copy link
Collaborator

Regarding ffmpeg settings, @rgaudin @kelson42 do we have any past issue which discuss why these settings have been chosen? I imagine finding one preset to more or less rule them all is not an easy feat.

Encoding logic is coming from python-scraperlib and presets (we use the low quality webm version for Khan Academy recipe)

Webm low quality presets are coming from openzim/python-scraperlib#14 (openzim/python-scraperlib@78e2bb0#diff-2cc68edde814805fe24114313acdde91ae832adef02f7d0576675d74db3f7b58 more precisely) but I did not find any discussion there, so they probably have been ported from ted/youtube scrapers, but I failed to find any discussion over there.

@rgaudin
Copy link
Member

rgaudin commented Nov 27, 2023

  • we chose to go with webm/vp9 not because it was easy but because it was hard for legal/ethics reason which made it very difficult to work with on some platforms (I believe it's still partly broken on some apple ones).
  • this was discussed and tested. I remember we had a table with various candidate presets with accompanying videos.
  • those presets are not used in the farm. We use low-quality web-m everywhere.
  • ffmpeg is complicated and our presets have not been validated by anyone mastering it. So I'm not surprised there would be issues
  • there are opened discussions about video encoding (in scraperlib or overview I think) around format as well
  • there is no consensus on what quality we want for videos in our ZIM. Some people only focuses in the least possible file size while others would prefer a better-looking quality (at the expense of file size). It's difficult to achieve with a generic approach because of the wide range of videos we get and devices used to read them.

@benoit74
Copy link
Collaborator

I tested suggested settings on https://studio.learningequality.org/content/storage/b/7/b71ca7f102ae16e4023c9f49b015d6b7.mp4

I do not find a significant visual difference in the resulting file (but this is obviously very personal).

I confirm that processing is a little faster (from 10secs to 8secs) and file is more than 3 times smaller (from 2.7MB to 768KB, while original mp4 is 690KB).

I do not find any difference (in terms of processing time) between in Docker and on the host directly (same machine), so there is probably something strange/unusual in your Docker setup on your machine.

@benoit74
Copy link
Collaborator

benoit74 commented Nov 27, 2023

we chose to go with webm/vp9

vp9 or vp8? looking at the setting I believe we use vp8

@rgaudin
Copy link
Member

rgaudin commented Nov 27, 2023

Sorry it's a slip, vp8 of course

@benoit74
Copy link
Collaborator

Progressing towards a merged PR on this will obviously needs significant testing with many kind of videos and we (@kiwix) probably won't have sufficient bandwidth for this in the coming months.

Contributions are of course more than welcomed.

Note however that this effort might conflict with another initiative we might consider to start around choosing a different video codec (and JS libs to fallback when reader/browser does not support this codec). The test set (and testing procedure) will nevertheless be very useful and most probably reused.

@kevinmcmurtrie
Copy link
Author

I was going to kill this task on pixelmemory because it has built up over 161 GB of files...but not really. The filesystem compression ratio is over 3:1 so it's only 52GB on disk. That should not be happening for video files.

@kelson42
Copy link
Contributor

kelson42 commented Dec 5, 2023

At this stage it looks like we might move from webm/vp8 to mpg4/h264. If we go that direction, we should reassess our ffmpeg command line (in particular for low quality).

@kelson42 kelson42 added the question Further information is requested label Dec 5, 2023
@benoit74
Copy link
Collaborator

benoit74 commented Dec 6, 2023

I was going to kill this task on pixelmemory because it has built up over 161 GB of files...but not really. The filesystem compression ratio is over 3:1 so it's only 52GB on disk. That should not be happening for video files.

I can only agree. And it match the 3:1 ratio we both observed when changing the ffmpeg settings. I compressed (with default Zip settings on Mac) the "big" video I previously obtained with current scraper ffmpeg settings and I confirm it compress very well (again a 3:1 ratio, going from 2.7M to 868MB) which shouldn't be possible for a video file.

@benoit74
Copy link
Collaborator

benoit74 commented Dec 6, 2023

Edit: 868KB, not 868MB

@kelson42 kelson42 pinned this issue Feb 3, 2024
@benoit74 benoit74 added this to the 1.2.0 milestone Feb 14, 2024
@benoit74 benoit74 self-assigned this Feb 14, 2024
@benoit74 benoit74 added bug Something isn't working and removed question Further information is requested labels Feb 14, 2024
@kelson42 kelson42 unpinned this issue Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants