silence removal for transcription implemented #1649

artshcherbina · 2023-12-16T15:37:23Z

Thank you for you great work!

I've added some simple logic to detect silence, and process only real voice input.
I start transcription, if more then 1 second of silence is passed (You may need to tune --pressure-t to your microphone).
You may tune silence duration with --silence_t argument.
Transcribed text is copied to the clipboard (currenly only on linux, xclip is used).
The code can be further improved, if it seems reasonable.

----------
----------
----------
----------
9---------
99---------
99---------
3992---------
93992---------
93999---------
994999---------
9994999---------
69992994---------
769992994---------
9-9995999---------
-9-9996999---------
--9-9996999---------
--795999-995---------
---694999-995---------
----694999-996---------
-----929998999---------
------929999899---------
-------929999799---------
-------593999-998---------
--------493999-999---------
---------929999699---------
 Here is some English text
----------
----------
----------
----------
----------
----------
6---------
96---------
99---------
999---------
2997---------
9-997---------
99-997---------
7999992---------
-6998992---------
9-6998992---------
92-992998---------
492-993999---------
799-5997993---------
6799-5997993---------
59492-994999---------
459493-995999---------
4459493-995999---------
-755899-4995994---------
2-755889-4995994---------
-2-755989-4995994---------
--33559594-997999---------
---33559594-998999---------
---22754979-3994995---------
----22754979-3993995---------
------2754969-3993995---------
------32659596-999999---------
-------32658596-999999---------
--------32658696-899999---------
---------3744959-2992996---------
 During silence nothing is detected.
----------
----------
----------
----------

jensdraht1999 · 2024-09-24T02:39:26Z

@artshcherbina Hello, I think, this is good, but I wanted to asked, if this can be implemented also to be in the .srt file and so on? And if yes, what happens, does the timestamps correctly align with the original file?

I implemented something similar in Windows via ffmpeg and silence detection.

Split the audio in 1 second samples.
2.) If the audio is too low (50db), then regard it as silence.
3.) Replace all audio samples, that are regarded as silent with a sound sample (for example white noise / or any other voice or silent audio)

However, this did not improve anything unfortunately.

The sad thing is, that Whisper is trying to transcribe literally everything.

Do you any audio samples, I could try to test out with my script, I have a few scripts, which I would like to test out and share with the community, if they are working good or better.

artshcherbina · 2024-09-24T07:40:36Z

Hello, @jensdraht1999 .
I haven't worked with .srt files.
Currently, I don't have any samples.
You may record some samples with your mic for testing

jensdraht1999 · 2024-09-24T15:58:53Z

Hello, @jensdraht1999 . I haven't worked with .srt files. Currently, I don't have any samples. You may record some samples with your mic for testing

I wonder, if you could resolve the rest of the issues and then merge it.

artshcherbina · 2024-09-25T07:23:02Z

My changes break some default behavior.
So I have no plan to merge this PR now.

auto transcription implemented

a1beb25

artshcherbina mentioned this pull request Dec 16, 2023

Support for realtime audio input #10

Closed

code tuned

f79b8ad

artshcherbina changed the title ~~auto transcription implemented~~ silence removal for transcription implemented Dec 16, 2023

Artem Shcherbina added 5 commits December 16, 2023 18:31

time printing added

067ab30

tuned

8ae32e5

step added

1d05f3c

simple text postprocessing added

8c2e1b2

silence_t added

c18e62c

dgm3333 mentioned this pull request Dec 19, 2023

Can real-time transcription be achieved? #1653

Open

artshcherbina marked this pull request as draft September 25, 2024 07:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

silence removal for transcription implemented #1649

silence removal for transcription implemented #1649

artshcherbina commented Dec 16, 2023 •

edited

Loading

jensdraht1999 commented Sep 24, 2024

artshcherbina commented Sep 24, 2024

jensdraht1999 commented Sep 24, 2024

artshcherbina commented Sep 25, 2024

silence removal for transcription implemented #1649

Are you sure you want to change the base?

silence removal for transcription implemented #1649

Conversation

artshcherbina commented Dec 16, 2023 • edited Loading

jensdraht1999 commented Sep 24, 2024

artshcherbina commented Sep 24, 2024

jensdraht1999 commented Sep 24, 2024

artshcherbina commented Sep 25, 2024

artshcherbina commented Dec 16, 2023 •

edited

Loading