Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Makes Google and Wav2Vec2 audio transcription not dependent of FFmpeg anymore #1267

Closed
lfcnassif opened this issue Aug 10, 2022 · 11 comments · Fixed by #1543
Closed

Makes Google and Wav2Vec2 audio transcription not dependent of FFmpeg anymore #1267

lfcnassif opened this issue Aug 10, 2022 · 11 comments · Fixed by #1543
Assignees

Comments

@lfcnassif
Copy link
Member

Ffmpeg is used to break audios longer than 1min and must be set on PATH explicitly by user. Maybe we can use the already embedded mplayer to break large audios, removing that external dependency.

If anyone already knows the mplayer command for that, please let me know.

@lfcnassif
Copy link
Member Author

I wasn't able to find a mplayer option to split audios. Closing.

If anyone knows how, please let me know.

@wladimirleite wladimirleite self-assigned this Feb 24, 2023
@wladimirleite
Copy link
Member

wladimirleite commented Feb 24, 2023

I am going to take a look and see if I find a way to use MPlayer instead of FFmpeg to split large WAV audios.

@lfcnassif
Copy link
Member Author

Thank you @tc-wleite for looking into this! I searched for an option for that in mplayer manual 2 times in the past, but didn't find an obvious option for that...

@lfcnassif
Copy link
Member Author

lfcnassif commented Feb 24, 2023

Some time ago I thought about breaking WAV files manually. I think splitting the raw audio data would be easy since we always convert to PCM mono 16Khz 16 bits per sample little endian. But possibly a header with some WAV metadata would need to be added to each chunk, not sure...

@lfcnassif
Copy link
Member Author

@wladimirleite
Copy link
Member

Ok, I think I found a simple solution, using only Java code.
I tested a standalone program with a few audios, and it seems to work fine.
Unfortunately my work PC is down because of building maintenance there, so I will be able to integrate to IPED and run proper tests next week.

@wladimirleite wladimirleite reopened this Feb 24, 2023
@wladimirleite wladimirleite changed the title Makes Google audio transcription not dependent of ffmpeg anymore Makes Google and Wav2Vec2 audio transcription not dependent of FFmpeg anymore Feb 24, 2023
@wladimirleite
Copy link
Member

wladimirleite commented Feb 24, 2023

In case someone wants to check the idea, here is the standalone program:

import java.io.*;
import javax.sound.sampled.*;

public class WavSplit {
    public static void main(String[] args) throws Exception {
        int partDurationInSeconds = 60;
        AudioInputStream ais = AudioSystem.getAudioInputStream(new File("in.wav"));
        int bytesPerFrame = ais.getFormat().getFrameSize();
        int framesPerPart = Math.round(ais.getFormat().getFrameRate() * partDurationInSeconds);
        byte[] partBytes = new byte[framesPerPart * bytesPerFrame];
        int numBytesRead = 0;
        int seq = 0;
        while ((numBytesRead = ais.readNBytes(partBytes, 0, partBytes.length)) > 0) {
            ByteArrayInputStream bais = new ByteArrayInputStream(partBytes, 0, numBytesRead);
            AudioInputStream audioInputStream = new AudioInputStream(bais, ais.getFormat(), numBytesRead);
            AudioSystem.write(audioInputStream, AudioFileFormat.Type.WAVE, new File(String.format("out-%03d.wav", ++seq)));
            audioInputStream.close();
        }
        ais.close();
    }
}

@wladimirleite
Copy link
Member

By the way, I found a solution with MPlayer to extract a part of the audio (e.g. cut from the beginning until 60s), but that would require multiple calls to achieve what we are looking for.

@lfcnassif
Copy link
Member Author

lfcnassif commented Feb 24, 2023

In case someone wants to check the idea, here is the standalone program:

import java.io.*;
import javax.sound.sampled.*;

public class WavSplit {
    public static void main(String[] args) throws Exception {
        int partDurationInSeconds = 60;
        AudioInputStream ais = AudioSystem.getAudioInputStream(new File("in.wav"));
        int bytesPerFrame = ais.getFormat().getFrameSize();
        int framesPerPart = Math.round(ais.getFormat().getFrameRate() * partDurationInSeconds);
        byte[] partBytes = new byte[framesPerPart * bytesPerFrame];
        int numBytesRead = 0;
        int seq = 0;
        while ((numBytesRead = ais.readNBytes(partBytes, 0, partBytes.length)) > 0) {
            ByteArrayInputStream bais = new ByteArrayInputStream(partBytes, 0, numBytesRead);
            AudioInputStream audioInputStream = new AudioInputStream(bais, ais.getFormat(), numBytesRead);
            AudioSystem.write(audioInputStream, AudioFileFormat.Type.WAVE, new File(String.format("out-%03d.wav", ++seq)));
            audioInputStream.close();
        }
        ais.close();
    }
}

Great! Thank you @tc-wleite! A java only solution is much better!!!

@lfcnassif
Copy link
Member Author

lfcnassif commented Feb 25, 2023

Code above seems to work fine with a few WAVs I tested here. Headers are properly created in front of each part, with same values, except those related to data size, of course. There are a few WAVs here where byte 16 changed from 0x12 to 0x10, which is good, i.e., the audio was converted to PCM, what is needed by us.

PS1: bytes 40-43 also changed, which I also believe is related to conversion to PCM.
PS2: they were converted from string "fact" to "data" shifted 2 bytes (from 0x12 - 0x10) to the left, that flags the audio was non PCM indeed.

@rafael844
Copy link

Once I asked in the MPlayer forum and the aswer was:

With -ao pcm I could create wav files, but dont know the parameters so I could split It. Could you give me an example?
Again, I recommend a tool designed for it, there are quite a few (audacity among others).
But you can use -ss and -endpos to set start and end.
However these values will not be exact, so the files will only be split very approximately and may both overlap or parts might be missing.
If that is not good enough, as said use a tool designed specifically for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants