Added support for HW-accelerated video encoding. #125

peci1 · 2020-11-10T20:45:21Z

This PR adds support for HW-accelerated video encoding to the VideoEncoder class. Supported (so far) are NVENC on Windows x86_64 (untested) and Linux x86_64 (tested), VAAPI on Linux x86_64 (tested) and QSV on Windows x86_64 (tested) and Linux x86_64 (untested). It should be quite easy to add more.

For better review, I've split this PR into smaller commits, as I had to do many small fixes in the existing pieces of the av library. Each of the commits has a commit message that explains why the change/fix was needed. The hw-accelerated encoding itself is then commited as the last single large commit.

This PR "depends" on #118 (it is based off of it), and commit 2c19bba is a cherry-pick of f518c91 from main. It should be relatively independent from #105 - I guess the merge shouldn't be difficult.

The general idea was to make the support "non-intrusive" - it can be switched off via a build option. When built in, it doesn't do anything until you tell it so via a special VideoEncoder::Start() signature, or via environment variables. It doesn't add any new dependencies to the library, nor does it increase the required versions of libav. The whole HWEncoder support class is package-private (it does not export its symbols nor install its header file). That is because it is highly coupled with VideoEncoder and it is not meant to be a general-purpose acceleration library.

I'm open for discussion about the architecture of the added code. This was just my best try.

HW-acceleration support can be completely turned off by build option IGN_COMMON_BUILD_HW_VIDEO (defaults to ON).

By default, the HW-accelerated encoders are not used. It is for stability and security reasons - it is HW interaction in the end, so segfaults may occur, though I did my best to handle all error cases correctly.

The search for HW encoders has to be enabled by setting environment variable IGN_VIDEO_ALLOWED_ENCODERS=ALL (or a specific encoder if you know which works for you). If you do not know which one would work, setting ALL should be fine - it triggeres a loop over all available HW encoders and tries them with a good set of default GPU devices. That could work well on roughly 80% of supported platforms. When the automatic configuration fails, the encoder falls back to software encoding.

There are 2 more variables that affect the HW-accelerated encoder configuration: IGN_VIDEO_ENCODER_DEVICE and IGN_VIDEO_USE_HW_SURFACE. See VideoEncoder.hh for their documentation.

Signed-off-by: Martin Pecka <peckama2@fel.cvut.cz>

peci1 · 2020-11-10T20:53:38Z

I've also added an integration test that decodes-encodes-decodes. By default, it runs without HW acceleration, so no code from HWEncoder is tested, but if you know you're building on a machine with a GPU, the acceleration tests can be enabled via the environment variables.

peci1 · 2020-11-10T21:09:47Z

Windows HW acceleration was tested with this build of ffmpeg: https://www.gyan.dev/ffmpeg/builds/packages/ffmpeg-4.3.1-2020-10-28-full_build-shared.7z .

peci1 · 2020-11-10T21:22:52Z

Outch, segfault on Mac even without HW acceleration. Unfortunately, I don't have access to any to debug...

peci1 · 2020-11-11T04:31:15Z

Okay, after misusing the buildfarm a bit, I got it, swscale needs aligned data on Mac. But I'm done for today, so I'll apply it properly tomorrow.

But thanks to this Mac-only bug, I also found the reason why decoders and encoders shortened the videos - the implementations of NextFrame() and Encode() do not "finish" the operation by entering draining modes of the respective codecs. I've already got a fix for the decoder part, and it nicely reads the exact same number of frames as ffmpeg. Encoder should be analogic, but it will require the last drained frames to be actually written from SaveToFile(). I'll add these fixes to this PR, too.

peci1 · 2020-11-11T16:22:40Z

Okay, here it goes.

Mac segfaults are fixed (by defaulting to using 32-byte-aligned frames everywhere except raw input/output buffers). This should be potentially beneficial for other systems with e.g. custom builds of libav which might use more optimized codecs that require alignment.

Draining is implemented for both encoder and decoder. Decoding now doesn't lose a single frame (it reads the same number of frames as ffprobe reports), encoding loses exactly one frame (tested on two video files) compared to 10-20 frames lost without this fix.

I figured out that all changes proposed in #114 were needed here, too, so I closed #114.

I also transplanted commit d2ce032 from #115 (which has been merged to the still open #105). Cherry-pick was not possible because that would need bringing in more changes from #105 which I did not want. But the merge should be fine.

mjcarroll · 2020-11-11T16:49:50Z

Fantastic @peci1, I appreciate all of your work here. I'm a bit busy today and tomorrow (ROS World), but I'll try to set aside some time to get a review for you!

peci1 · 2020-11-17T13:11:18Z

Rebased on top of ign-common3 to incorporate changes from #105 . All looks good!

peci1 · 2020-11-21T00:06:17Z

There's a funny (well, not really that funny) thing I stumbled upon. I'm running playback recorders with this PR on a 40-core 8-GPU server. I was looking forward for having the recordings pretty fast. But no way - the server has GeForce 2080 TIs, and NVidia puts an artificial limit on the number of currently running NVENC sessions when using GeForce cards - 3 sessions per system (yes, per system, not per GPU). So I can only run 3 concurrent playback recorders. The server-class cards like Tesla do not have this limitation (don't ask me why we have GeForces in a server).

Maybe that could be mentioned somewhere? In this case, avcodec_open2 returns ENOMEM, which is not really a unique way to identify this case, but we could at least add an additional text to the error message pointing to this problem. There is an unofficial patch which removes the artificial limit by patching libnvidia-encode.so library. Should the error message point to this repo, too?

This is how it looks in the console (for the 4th session):

[GUI] [Msg] Recording video using sim time.
[GUI] [Msg] Recording video in lockstep mode
[GUI] [Msg] Recording video using bitrate: 8000000
[GUI] [Msg] Found known HW encoder: h264_nvenc
[GUI] [Msg] Initialized NVENC on device /dev/nvidia7
[GUI] [Msg] Using encoder h264_nvenc
[GUI] [Err] [Util.cc:58] ffmpeg [h264_nvenc] OpenEncodeSessionEx failed: out of memory (10)

[GUI] [Err] [VideoEncoder.cc:507] Could not open video codec: Cannot allocate memory. Video encoding is not started

What's even worse, once this 4th session starts recording, all other playback recorders segfault.

Also, this loop from ign-gazebo results in repeated calls to VideoEncoder::Start(): https://github.com/ignitionrobotics/ign-gazebo/blob/a5abacfa74db6e8db0218bd32c55980e15cf249b/src/systems/camera_video_recorder/CameraVideoRecorder.cc#L297 . When the encoder fails to open (as in the 4th stream case), IsEncoding() stays false, and the next PostRender callback will try to initialize the recorder again. ign-gazebo does not detect that the encoder can't be brought up. Should I add some method like VideoRecorder::IsOk() (there'd probably be a better name) that could tell the user that repeated calls to Start() will fail? Or maybe it's enough to actually read the return value of Start()? The ign-gazebo code doesn't do it... It could be specified that if Start() returns false, repeated calls to it with the same parameters will also fail (until some external circumstances change - e.g. the GPU being freed up from other tasks).

mjcarroll · 2020-11-23T14:29:48Z

I read about this briefly, so a quick question would be is this a limitation purely for market differentiation?

I don't think that you would be unique in having multiple consumer-grade Nvidia cards in a server. The performance-per-dollar makes a lot of sense for many users to prefer the GeForce over the server-class cards (myself included!).

For the time, I think it makes sense to have some user-configurable soft cap in the the VideoEncoder that is defaulted to the most common value for NVENC (3 in this case).

Maybe a reasonable API would be something like StreamsAvailable so that the CameraVideoRecorder will only attempt to Start if there is still underlying capability?

peci1 · 2020-11-24T00:03:33Z

Yes, the limit is in the drivers - 8 GPUs would be okay to encode at least 8 streams, not only 3.

The problem is that the limit is system-wide across all processes. And I googled a bit but haven't found an API to determine the number of remaining sessions. So I think we can just detect when the system was not able to bring up the decoder, and when we detect the ENOMEM error, we can hint the user that he might be looking in this direction. Maybe I could also add an env var to specify whether software encoder fallback is allowed? Maybe somebody would rather want the encoder to fail than to run on CPU...

mjcarroll · 2020-11-24T14:00:55Z

Quite annoying that they don't surface an API for this, or even something in /proc.

I think that your approach sounds good. I would prefer that we could configure the fallback behavior through the API itself rather than environment variables, if possible.

Since the point of this is to keep the API generic, maybe it would make sense to have a mechanism for setting implementation-specific flags, via something like GET_CAPS/SET_CAPS?

peci1 · 2020-11-24T14:58:11Z

Since the point of this is to keep the API generic, maybe it would make sense to have a mechanism for setting implementation-specific flags, via something like GET_CAPS/SET_CAPS?

That sounds a bit more complicated than I wanted the API to be. Or maybe I just misunderstood you, can you elaborate on that?

Environment variables are just one option to configure the behavior - they're used when you call the original Start() signature. But I've also added the extended signature which allows to configure all the parameters in code. The reason for making a part of the API configurable via environment variables was that it would take a long time for all downstream code to properly implement all the choices and propagate them to the user via a UI. And some downstream code could be completely uninterested in implementing it. Environment variables are quite simple and anyone can start using the acceleration right away.

mjcarroll · 2020-11-24T15:53:09Z

That sounds a bit more complicated than I wanted the API to be. Or maybe I just misunderstood you, can you elaborate on that?

No, on second thought, you are right. It would take a long time for this to propagate (in the case that downstream folks would even care about it).

I think as long as the environment variables have sane defaults and are documented well, it shouldn't be an issue.

As a follow-up, would you be interested in writing a short tutorial to cover how the hardware encoding works? I believe it would give it more visibility.

peci1 · 2020-11-24T16:03:17Z

as long as the environment variables have sane defaults

The default is to use solely software rendering :)

As a follow-up, would you be interested in writing a short tutorial to cover how the hardware encoding works? I believe it would give it more visibility.

Sure, I can do that once the feature is finalized and merged.

… - we're not resizing at all, just changing pixel format. Signed-off-by: Martin Pecka <peci1@seznam.cz>

…systems. Signed-off-by: Martin Pecka <peckama2@fel.cvut.cz>

…xhausted. This will make sure decoding does not lose frames at the end of the video file. Signed-off-by: Martin Pecka <peckama2@fel.cvut.cz>

It can be completely turned off by build option IGN_COMMON_BUILD_HW_VIDEO (defaults to ON). By default, the HW-accelerated encoders are not used. It is for stability and security reasons - it is HW interaction in the end, so segfaults may occur, though I did my best to handle all error cases correctly. The search for HW encoders has to be enabled by setting environment variable IGN_VIDEO_ALLOWED_ENCODERS=ALL (or a specific encoder if you know which works for you). If you do not know which one would work, setting ALL should be fine - it triggeres a loop over all available HW encoders and tries them with a good set of default GPU devices. That could work well on roughly 80% of supported platforms (x86_64 Win 10 and x86_64 Linux). When the automatic configuration fails, the encoder falls back to software encoding. There are 2 more variables that affect the HW-accelerated encoder configuration: IGN_VIDEO_ENCODER_DEVICE and IGN_VIDEO_USE_HW_SURFACE. See VideoEncoder.hh for their documentation. The whole HWEncoder support class is package-private (it does not export its symbols nor install its header file). That is because it is highly coupled with VideoEncoder and it is not meant to be a general-purpose acceleration library. No libav functions used in this commit should increase the requirements for minimum versions of any libav library. Signed-off-by: Martin Pecka <peci1@seznam.cz> Signed-off-by: Martin Pecka <peckama2@fel.cvut.cz>

This should prevent some potential crashes. Signed-off-by: Martin Pecka <peckama2@fel.cvut.cz>

…topped. This makes sure frames at the end of the encoding session are not dropped. Signed-off-by: Martin Pecka <peckama2@fel.cvut.cz> Signed-off-by: Martin Pecka <peci1@seznam.cz>

…created. Also add a simple VideoEncoder::Start() override that allows turning HW acceleration detection on/off. Signed-off-by: Martin Pecka <peckama2@fel.cvut.cz> Signed-off-by: Martin Pecka <peci1@seznam.cz>

peci1 · 2021-01-08T13:55:40Z

I rebased on the latest changes in #118.

I added a check for the special case of NVENC failing to start due to ENOMEM. It now prints this warning:

  ignwarn << "If this computer has non-server-class GPUs (like GeForce), "
          << "it is possible that you have reached the maximum number of "
          << "simultaneous NVENC sessions (most probably 3). This limit is "
          << "not per GPU, but per the whole computer regardless of the "
          << "number of GPUs installed. You can try to circumvent this "
          << "limit by using the unofficial driver patch at "
          << "https://github.com/keylase/nvidia-patch . If you cannot (or "
          << "do not want) install this patch, do not run more than 3 "
          << "HW-accelerated video encoding tasks on this computer "
          << "simultaneously.\n";

I also added an override of VideoEncoder::Start() with just a simple bool argument _allowHwAccel which turns on/off the configuration using environment variables. So, it can be suggested to downstream code to first call the Start() method as they used to, and if it fails, they can call it once more with _allowHwAccel=false to see if disabling HW acceleration would help. I first thought of some more automated fallback mechanism, but then I figured out the NVENC error happens too late - after the NVENC encoder was detected and selected by ConfigHWAccel(). Doing some automated attempt at repairs by detecting what exactly failed during the run of VideoEncoder would be a pretty complicated topic. Randomly turning on/off things would probably not help.

…n Windows (happens on buildfarm but not locally). Signed-off-by: Martin Pecka <peckama2@fel.cvut.cz> Signed-off-by: Martin Pecka <peci1@seznam.cz>

mjcarroll · 2021-02-03T18:19:14Z

@peci1 Can you just clean up the merge conflicts here and I think this will be good to go.

# Conflicts: # av/src/VideoEncoder.cc # include/ignition/common/FlagSet.hh # src/FlagSet_TEST.cc

…err2str_cpp . Signed-off-by: Martin Pecka <peckama2@fel.cvut.cz> Signed-off-by: Martin Pecka <peci1@seznam.cz>

Signed-off-by: Martin Pecka <peckama2@fel.cvut.cz> Signed-off-by: Martin Pecka <peci1@seznam.cz>

peci1 · 2021-02-03T23:36:16Z

All green!

I re-enabled the video tests disabled in #149 because they now work. The PermissionError when deleting files on Windows happens when the file is still open. I guess that happened because of a missing free in VideoEncoder::Stop(). This PR improves the behavior of this method and fixed one leaked resource which might have been the reason for the test failures on Windows. Feel free to run a few more test rounds to see if the tests are not flaky.

I also merged the av_err2str function added in #149 with the one I defined earlier in this PR. I didn't like redefining a library function, so I rather kept the changed name. I also decided to keep it in the header file so that other files using ffmpeg have access to this function and do not have to reimplement it.

mjcarroll · 2021-02-04T13:57:49Z

Thanks @peci1!

If I could ask one more small favor, it would be great to have a short tutorial on how to use this feature. Can you add that in a follow-up?

peci1 · 2021-02-04T13:59:30Z

Sure, I'll have a look at it. Should I add it to ign-common/tutorials?

mjcarroll · 2021-02-04T14:06:43Z

Yes, and add it to the index here: https://github.com/ignitionrobotics/ign-common/blob/ign-common3/tutorials.md.in

peci1 · 2021-02-08T02:04:24Z

Tutorial: #169.

peci1 and others added 2 commits November 2, 2020 22:47

Added FlagSet helper class that represents sets of flags.

96bf38f

Signed-off-by: Martin Pecka <peckama2@fel.cvut.cz>

Merge branch 'ign-common3' into flag_set

3c39f76

peci1 requested a review from mjcarroll as a code owner November 10, 2020 20:45

github-actions bot added 🏢 edifice Ignition Edifice 🏰 citadel Ignition Citadel 📜 blueprint Ignition Blueprint 🔮 dome Ignition Dome labels Nov 10, 2020

peci1 force-pushed the hw_encoding branch from 5cde4fe to 2367c1c Compare November 10, 2020 20:50

peci1 force-pushed the hw_encoding branch from d1762b0 to b0a64fe Compare November 10, 2020 21:13

peci1 force-pushed the hw_encoding branch from e64798c to 79cf936 Compare November 11, 2020 12:02

peci1 mentioned this pull request Nov 11, 2020

Video decoder fixes #114

Closed

chapulina assigned mjcarroll Nov 16, 2020

Merge branch 'ign-common3' into flag_set

e752799

peci1 force-pushed the hw_encoding branch from 378bcd1 to 1d24265 Compare November 17, 2020 12:56

peci1 and others added 7 commits January 8, 2021 13:33

av::VideoEncoder: Do not use any kind of interpolation in sws context…

602b70b

… - we're not resizing at all, just changing pixel format. Signed-off-by: Martin Pecka <peci1@seznam.cz>

av::Video: Use 32-byte-aligned output frame to avoid crashes on some …

4ef2339

…systems. Signed-off-by: Martin Pecka <peckama2@fel.cvut.cz>

av::Video: Implement draining the decoder after input file has been e…

c667812

…xhausted. This will make sure decoding does not lose frames at the end of the video file. Signed-off-by: Martin Pecka <peckama2@fel.cvut.cz>

av::VideoEncoder: Use 32-byte-aligned frames for video encoder.

59a0a5d

This should prevent some potential crashes. Signed-off-by: Martin Pecka <peckama2@fel.cvut.cz>

av::VideoEncoder: Implemented draining the encoder when encoding is s…

4fa7ade

…topped. This makes sure frames at the end of the encoding session are not dropped. Signed-off-by: Martin Pecka <peckama2@fel.cvut.cz> Signed-off-by: Martin Pecka <peci1@seznam.cz>

av::VideoEncoder: Be more informative when NVENC session fails to be …

8e23c28

…created. Also add a simple VideoEncoder::Start() override that allows turning HW acceleration detection on/off. Signed-off-by: Martin Pecka <peckama2@fel.cvut.cz> Signed-off-by: Martin Pecka <peci1@seznam.cz>

peci1 force-pushed the hw_encoding branch from 1d24265 to 8e23c28 Compare January 8, 2021 13:45

av::VideoEncoder: Allow one more lost frame from video_encoder test o…

37a6ed3

…n Windows (happens on buildfarm but not locally). Signed-off-by: Martin Pecka <peckama2@fel.cvut.cz> Signed-off-by: Martin Pecka <peci1@seznam.cz>

peci1 force-pushed the hw_encoding branch from 4a81b58 to 37a6ed3 Compare January 8, 2021 14:27

peci1 added 3 commits February 3, 2021 23:43

Merge branch 'ign-common3' into hw_encoding

6a13245

# Conflicts: # av/src/VideoEncoder.cc # include/ignition/common/FlagSet.hh # src/FlagSet_TEST.cc

av::VideoEncoder: Moved av_err2str redefinition to a new function av_…

99a5060

…err2str_cpp . Signed-off-by: Martin Pecka <peckama2@fel.cvut.cz> Signed-off-by: Martin Pecka <peci1@seznam.cz>

av::VideoEncoder: Enabled Windows tests again (they work for me).

baae6ac

Signed-off-by: Martin Pecka <peckama2@fel.cvut.cz> Signed-off-by: Martin Pecka <peci1@seznam.cz>

mjcarroll approved these changes Feb 4, 2021

View reviewed changes

mjcarroll merged commit f2637fe into gazebosim:ign-common3 Feb 4, 2021

peci1 mentioned this pull request Feb 6, 2021

👨‍🌾 Regression/Flake on VideoEncoderTest.Exists for ign-common3 on Windows #148

Closed

scpeters mentioned this pull request Mar 9, 2021

INTEGRATION_video_encoder fails on arm64, armhf debbuilds #180

Closed

This was referenced Mar 9, 2021

🎈 3.11.0 #178

Merged

Relax expectation so encoder test passes on ARM #183

Merged

jennuine mentioned this pull request Jul 21, 2022

Add code coverage ignore file #394

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added support for HW-accelerated video encoding. #125

Added support for HW-accelerated video encoding. #125

peci1 commented Nov 10, 2020 •

edited

Loading

peci1 commented Nov 10, 2020

peci1 commented Nov 10, 2020

peci1 commented Nov 10, 2020

peci1 commented Nov 11, 2020

peci1 commented Nov 11, 2020

mjcarroll commented Nov 11, 2020

peci1 commented Nov 17, 2020

peci1 commented Nov 21, 2020

mjcarroll commented Nov 23, 2020

peci1 commented Nov 24, 2020

mjcarroll commented Nov 24, 2020

peci1 commented Nov 24, 2020

mjcarroll commented Nov 24, 2020

peci1 commented Nov 24, 2020

peci1 commented Jan 8, 2021

mjcarroll commented Feb 3, 2021

peci1 commented Feb 3, 2021

mjcarroll commented Feb 4, 2021

peci1 commented Feb 4, 2021

mjcarroll commented Feb 4, 2021

peci1 commented Feb 8, 2021

Added support for HW-accelerated video encoding. #125

Added support for HW-accelerated video encoding. #125

Conversation

peci1 commented Nov 10, 2020 • edited Loading

peci1 commented Nov 10, 2020

peci1 commented Nov 10, 2020

peci1 commented Nov 10, 2020

peci1 commented Nov 11, 2020

peci1 commented Nov 11, 2020

mjcarroll commented Nov 11, 2020

peci1 commented Nov 17, 2020

peci1 commented Nov 21, 2020

mjcarroll commented Nov 23, 2020

peci1 commented Nov 24, 2020

mjcarroll commented Nov 24, 2020

peci1 commented Nov 24, 2020

mjcarroll commented Nov 24, 2020

peci1 commented Nov 24, 2020

peci1 commented Jan 8, 2021

mjcarroll commented Feb 3, 2021

peci1 commented Feb 3, 2021

mjcarroll commented Feb 4, 2021

peci1 commented Feb 4, 2021

mjcarroll commented Feb 4, 2021

peci1 commented Feb 8, 2021

peci1 commented Nov 10, 2020 •

edited

Loading