Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

silence removal for transcription implemented #1649

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

artshcherbina
Copy link

@artshcherbina artshcherbina commented Dec 16, 2023

Thank you for you great work!

I've added some simple logic to detect silence, and process only real voice input.
I start transcription, if more then 1 second of silence is passed (You may need to tune --pressure-t to your microphone).
You may tune silence duration with --silence_t argument.
Transcribed text is copied to the clipboard (currenly only on linux, xclip is used).
The code can be further improved, if it seems reasonable.

----------
----------
----------
----------
9---------
99---------
99---------
3992---------
93992---------
93999---------
994999---------
9994999---------
69992994---------
769992994---------
9-9995999---------
-9-9996999---------
--9-9996999---------
--795999-995---------
---694999-995---------
----694999-996---------
-----929998999---------
------929999899---------
-------929999799---------
-------593999-998---------
--------493999-999---------
---------929999699---------
 Here is some English text
----------
----------
----------
----------
----------
----------
6---------
96---------
99---------
999---------
2997---------
9-997---------
99-997---------
7999992---------
-6998992---------
9-6998992---------
92-992998---------
492-993999---------
799-5997993---------
6799-5997993---------
59492-994999---------
459493-995999---------
4459493-995999---------
-755899-4995994---------
2-755889-4995994---------
-2-755989-4995994---------
--33559594-997999---------
---33559594-998999---------
---22754979-3994995---------
----22754979-3993995---------
------2754969-3993995---------
------32659596-999999---------
-------32658596-999999---------
--------32658696-899999---------
---------3744959-2992996---------
 During silence nothing is detected.
----------
----------
----------
----------

@artshcherbina artshcherbina changed the title auto transcription implemented silence removal for transcription implemented Dec 16, 2023
@jensdraht1999
Copy link

@artshcherbina Hello, I think, this is good, but I wanted to asked, if this can be implemented also to be in the .srt file and so on? And if yes, what happens, does the timestamps correctly align with the original file?

I implemented something similar in Windows via ffmpeg and silence detection.

  1. Split the audio in 1 second samples.
    2.) If the audio is too low (50db), then regard it as silence.
    3.) Replace all audio samples, that are regarded as silent with a sound sample (for example white noise / or any other voice or silent audio)

However, this did not improve anything unfortunately.

The sad thing is, that Whisper is trying to transcribe literally everything.

Do you any audio samples, I could try to test out with my script, I have a few scripts, which I would like to test out and share with the community, if they are working good or better.

@artshcherbina
Copy link
Author

Hello, @jensdraht1999 .
I haven't worked with .srt files.
Currently, I don't have any samples.
You may record some samples with your mic for testing

@jensdraht1999
Copy link

Hello, @jensdraht1999 . I haven't worked with .srt files. Currently, I don't have any samples. You may record some samples with your mic for testing

I wonder, if you could resolve the rest of the issues and then merge it.

@artshcherbina
Copy link
Author

My changes break some default behavior.
So I have no plan to merge this PR now.

@artshcherbina artshcherbina marked this pull request as draft September 25, 2024 07:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants