Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve optional remote video transcription network usage #1802

Open
lfcnassif opened this issue Aug 6, 2023 · 9 comments
Open

Improve optional remote video transcription network usage #1802

lfcnassif opened this issue Aug 6, 2023 · 9 comments

Comments

@lfcnassif
Copy link
Member

As suggested on #1801 (comment)

@lfcnassif
Copy link
Member Author

lfcnassif commented Aug 6, 2023

What would be a good compressed audio format to send, without taking to long to convert the audio format? Or should we just send the original extracted audio channel from the videos as is?

@wladimirleite
Copy link
Member

In theory, FLAC would be a good format (fast encoder/decoder, good quality and compression).
However, I think MPlayer can only decode it. In fact, I couldn't find any suitable option using MPlayer, other than the PCM output (already used in the server side), but I just took a quick look, so maybe there is a solution.

Another option would be using other tools, like FFmpeg or Mencoder. But that would add another dependency for a very
specific purpose.

Maybe we could use the same conversion done in the server side (PCM), and leave the usage of a better format as a future improvement. Although the WAV files are large, they usually are much smaller than the videos themselves. And it won't be necessary to run another conversion on the server side.

One important thing, the current command already includes -vo null -vc null, but I would also add -novideo.
Testing with a set of large videos from various sources and formats, using -novideo made the audio extraction several times faster.
Obviously it depends on the file length, used hardware, video and audio formats used, but this parameter seems to help a lot in general.

@lfcnassif lfcnassif changed the title Improve optional video remote transcription network usage Improve optional remote video transcription network usage Aug 7, 2023
@lfcnassif
Copy link
Member Author

lfcnassif commented Aug 7, 2023

One important thing, the current command already includes -vo null -vc null, but I would also add -novideo.
Testing with a set of large videos from various sources and formats, using -novideo made the audio extraction several times faster.
Obviously it depends on the file length, used hardware, video and audio formats used, but this parameter seems to help a lot in general.

Hi @tc-wleite, thanks for the performance tests! Sure, we can add the -novideo option.

About the audio format to send, is it possible to extract the audio from videos as is, without any conversion, using mplayer? So we could benefit from the original used compression.

@wladimirleite
Copy link
Member

wladimirleite commented Aug 7, 2023

About the audio format to send, is it possible to extract the audio from videos as is, without any conversion, using mplayer? So we could benefit from the original used compression.

I tried to do that, but couldn't find how to do it with MPlayer. It is focused in reproduction, so it supports a lot of input formats but not many output ones.

@wladimirleite
Copy link
Member

I found MPlayer's option -dumpaudio which dumps compressed audio channels from videos (as they originally are). However, the resulting file will only be playable in very specific cases. In practice, for ~20 videos of several formats that I used to test, the extracted files can't be reproduced, identified or converted to PCM by MPlayer. So it doesn't seem useful for what we need.

@lfcnassif
Copy link
Member Author

That's bad news, thanks for investigating @tc-wleite!

@lfcnassif
Copy link
Member Author

lfcnassif commented Aug 8, 2023

Hi @tc-wleite, a simple idea would be to use a general file compression algorithm already supported by Apache Commons Compress, . I run a few compression algorithms using 7zip on TEDx pt-BR test set slice (1033 audios):

WAV FLAC ZIP BZIP2 LZMA2
433MB 238MB 341MB 277MB 272MB
  • FLAC conversion was done using FFmpeg default options

Apache Commons Compress also supports other compression schemes.

PS: Measuring running times using 1 thread now...

@lfcnassif
Copy link
Member Author

Running times using one 7z thread (non-solid mode) and FFmpeg executed multiple times for each file to convert to/from FLAC:

  WAV FLAC ZIP BZIP2 LZMA2
Size 433MB 238MB 341MB 277MB 272MB
Compression - 73s 30s 77s 170s
Decompression - 50s 5s 14s 14s

Of course Apache Commons Compress running times should be different than above.

@wladimirleite
Copy link
Member

wladimirleite commented Aug 8, 2023

I thought about that too.
It seems a good option, that will save some network bandwidth without much overhead (in terms of required code, additional libraries and processing time).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants