Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix VP8 low bitrate #122

Merged
merged 9 commits into from
Feb 13, 2024
Merged

Fix VP8 low bitrate #122

merged 9 commits into from
Feb 13, 2024

Conversation

kevinmcmurtrie
Copy link
Contributor

@kevinmcmurtrie kevinmcmurtrie commented Jan 28, 2024

Edits by @benoit74

Enabler for openzim/kolibri#75
Fix #123

Changes

  • Rework the VideoWebmLow preset for faster encoding and smaller file size (preset has been bumped to version 2)
    • target bitrate -b:v is reduced to 128k instead of 300k (deemed sufficient)
    • -maxrate and -minrate have been removed (otherwise the quantizer has no room for adaptation)
    • -qmin is reduced to 18 instead of 30 (significantly reduce the bitrate on very still videos)
    • -qmax is decreased to 40 instead of 42
    • other settings are left as-is
  • When reencoding a video, ffmpeg now uses only 1 CPU thread by default (new arg to reencode allows to override this default value)
  • New utility to encode a video with an existing preset (for local debugging typically)

Original comment

openzim/kolibri#75

The VP8 values appear to have been copied from the x264 options but these codecs do not share the same scale.

Removed contradictory options. Use ordinary average bitrate target with min/max bounds on quality.

@benoit74
Copy link
Collaborator

Thank you for this contribution!

  • is it possible that you share with us the test raw and encoded videos (with old and new presets)? I still think we should also test with videos like this one: https://key.endlessos.org/en/explore/#/topics/c9d7f950ab6b5a1199e3d6c10d7f0103/c/8229720da4b95bb090b90111fe8c6c3f (which we encounter a lot in khanacademy)
  • why did you simplified so much the test by getting rid of most of it? (I agree the added value is limited but still, it is easy to fix only what need to be given your new settings)
  • you did not modified the VideoWebmHigh preset, is it because this one is OK? (is does it suffer the same flaw that it has been copied from x264?)
  • please fix the CI error(s)

@kevinmcmurtrie kevinmcmurtrie marked this pull request as draft January 29, 2024 08:36
@rgaudin
Copy link
Member

rgaudin commented Jan 29, 2024

Also I find it odd to not only fix the problematic flags but to change the target bitrate without mentioning it 🤓

@kevinmcmurtrie
Copy link
Contributor Author

kevinmcmurtrie commented Jan 29, 2024

Also I find it odd to not only fix the problematic flags but to change the target bitrate without mentioning it 🤓

The original bitrate isn't real because the numbers were impossible. It was actually < 200 kbps padded up to 300kbps. That goes into a compressed ZIM and the padding shrinks. My new value of 200 kbps might actually be higher. openzim/kolibri#75 (comment)

It needs some testing. I meant for this to be a draft PR but clicked it wrong.

@kevinmcmurtrie
Copy link
Contributor Author

Why.Letters.mp4

Why Letters(new).webm
Why Letters(old).webm

@rgaudin
Copy link
Member

rgaudin commented Feb 3, 2024

This looks very good @kevinmcmurtrie ; thank you.

@kelson42
Copy link
Contributor

kelson42 commented Feb 3, 2024

I don't have an eye for this kind of things but to me it looks like no loss in quality for a big gain in size. I love it. I can not wait to have our main video based scrapers released with this new configuration. @kevinmcmurtrie Thank you very much.

@benoit74
Copy link
Collaborator

benoit74 commented Feb 5, 2024

@kevinmcmurtrie let me check this QA issue, never mind

@benoit74
Copy link
Collaborator

benoit74 commented Feb 5, 2024

@rgaudin any idea why codecov upload is failing in qa job and why 3.6 and 3.7 jobs are still expected while they have been removed from the CI configuration?

@benoit74
Copy link
Collaborator

benoit74 commented Feb 5, 2024

@kevinmcmurtrie

First, thank you!

Aside the CI issue we will solve, do you need to change more things or should we consider this PR ready?

And I would like to keep a copy of your test videos so that we reuse them next time we need to make a change in presets / encoders. These are precious assets to avoid lengthy discussions.

"Why letters" is a khan academy video, so the license is the Khan one, correct?

Do you know where the two others come from? Do you know what is the license associated with the "Earth quake and social media" video? (the last video has an explicit CC license, but I would prefer to know where it comes from so that we do not discover too late that the video is copyrighted but someone added a CC watermark for fun)

@rgaudin
Copy link
Member

rgaudin commented Feb 5, 2024

@rgaudin any idea why codecov upload is failing in qa job and why 3.6 and 3.7 jobs are still expected while they have been removed from the CI configuration?

3.6 and 3.6 are preventing codecov from running so that explains it.
Why do we still have 3.6 and 3.7? I don't know. I've seen that in the past ; Seems like a Github issue ; as you can see the checks time does not mention them… but since the overall status is pending, codecov is not running.

@benoit74
Copy link
Collaborator

benoit74 commented Feb 5, 2024

Ok for 3.6 and 3.7
About codecov, I was speaking about this error: https://github.com/openzim/python-scraperlib/actions/runs/7781401886/job/21215769840?pr=122#step:7:155

@rgaudin
Copy link
Member

rgaudin commented Feb 5, 2024

Ah… I don't know. Maybe it's because this is an external PR, maybe we're using an old version of codecov…

I thought maybe the CODECOV_TOKEN had changed so I updated the secret and rerun the action but it didn't help.

I see that the CODECOV_TOKEN env looks empty in the command. I believe github displays stars instead for secrets…
Screenshot 2024-02-05 at 09 24 32

@kevinmcmurtrie
Copy link
Contributor Author

"Why letters" is a khan academy video, from the given example.
"Video 11: An introduction" is MIT Open Learning.
"Earthquakes and Social Media" is an NBC news broadcast I recorded for Amy (one of the speakers). Certain uses might be "fair use" but don't distribute it.

@kevinmcmurtrie kevinmcmurtrie marked this pull request as ready for review February 6, 2024 04:25
@kevinmcmurtrie
Copy link
Contributor Author

I made it a Draft because I feel like I don't have enough time at the moment to be a good contributor. My goal was just to come up with some better codec parameters. You can copy them over to a new PR or use this.

@benoit74
Copy link
Collaborator

benoit74 commented Feb 6, 2024

I made it a Draft because I feel like I don't have enough time at the moment to be a good contributor. My goal was just to come up with some better codec parameters. You can copy them over to a new PR or use this.

Thank you again for this and the tests you've made on a bunch of videos. It is a very good contribution, even if you consider it is not yet completed, because it was made with quality in mind and because it created a momentum and we feel like we are close to being able to release this.

Since I'm working today on Kolibri to fix other scraper issues you've reported, I will test these new codec parameters. They might not be "optimal" but we have to admit they are already way better than the ones we are using currently, so it makes sense to release them to make a move in the good direction (once I've confirmed they work with more videos).

@kevinmcmurtrie
Copy link
Contributor Author

Check the compressed size. The old videos had a lot of bitrate padding.

@benoit74 benoit74 added this to the 3.2.1 milestone Feb 12, 2024
kevinmcmurtrie and others added 5 commits February 13, 2024 09:50
Tested with nearly motionless security camera footage and a television news recording of a press release.
Better tuning for very low motion and very high motion.
Copy link

codecov bot commented Feb 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (a7fe235) 100.00% compared to head (a7307d1) 100.00%.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #122   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           32        32           
  Lines         1337      1345    +8     
  Branches       227       229    +2     
=========================================
+ Hits          1337      1345    +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@benoit74
Copy link
Collaborator

I took the liberty to update first comment to reflect the final content of this PR.

I've reencoded and uploaded all test videos to https://tmp.kiwix.org/ci/test-videos/

I've also added a utility to encode video (debug), fixed the ffmpeg threads usage.

I feel like there is mostly no change in visual quality between v1 and v2, but as already said computation takes way less time and resulting file is way smaller (especially for still videos like khan academy ones).

@rgaudin feedback is welcomed on:

  • the quality of v2 encoding, if any (there isn't any change compared to @kevinmcmurtrie tests, I just ran them again and replaced the video without clear copyrights by a TED one)
  • the PR as a whole, especially my last commits

Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rgaudin rgaudin merged commit e167913 into openzim:main Feb 13, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Limit ffmep to using only one CPU thread
4 participants