-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for large-v3 #1437
Comments
I came here to post this exact same issue, thank you @noe, from the looks of the commit, maybe you can try swap the model and see if that kinda works I guess. |
If the model does get revved, will it be in the new GGUF format? |
I tried to convert the pt to ggml but ran into some issues. If anyone wants to try: |
large-v3.pt download code:
I tried using this script to convert the large-v3.pt to the ggml file, but it seems the output ggml model file is not correct. |
@lanma what is wrong with the model file for you? Do you get errors? |
test.wav same file results Large-v2 ggml transcribe with zh language setting Large-v3 ggml after convert, transcribe with zh language setting Large-v3 ggml after convert, transcribe with en language setting |
Hey, can you give this PR a spin and check if it's working alright? #1444 |
Hi, I have problem with Polish language. Also I see some strange warnign:
Of corse standard |
Please make sure to download the latest version from the master branch and compile it yourself, as the version of whisper.cpp you're currently using is outdated. I can provide a Windows binary for testing purposes right here: |
After updating my whisper.cpp to the latest version, everything is working well! Thank you, everyone. |
ok, after upgrade now I have error with build CoreML model:
|
Yes, there is a bug in the script. The input dimensions should not be hardcoded. |
I tried to change it from 80 to 128 in this file, and after that now I have error here:
file is generated, but transcription not working :/ any sugestion? |
I have coreml working with large-v3 now, #1458
|
@jxy I can configrm... It works! |
In my test it begins to repeat the same phrase after ~17 minutes of transcribing. |
I gave the large model a 25 min audio. It broke down at about 16 to 17 min mark, but it got back to track at around 18 min. Tried again and it broke down at 8 min mark. It worked find with large-v2 and medium.en. So I guess something is still wrong. |
Do you notice the same phenomenon with OpenAI's Whisper? |
With OpenAI's Whisper, there are occasional repetitions of the previous sentence during a gap of silence in the audio, but it does not break down into endless repetitions. |
OpenAI's Whisper uses different strategy to overcome repetitions based on compressing the transcript, so it probably works better than To reduce repetitions, you can either try to increase the number of beams and / or the entropy threshold:
Or disable the context (not recommended, since we lose other nice qualities of the model):
However, I find it worrying that even the OG implementation repeats more and also sometimes produces invalid characters (#1444 (comment)) |
Didn't help in my case. Went to endless repetition of two lines after 12 minutes.
Yes, this fixed the repetition. But also added hallucinated
|
Would calling zlib to find the compress ratio of the transcript be the solution of our issues here? In addition, I also see those "♪ ♪". I thought the code explicitly filtered out these tokens. It doesn't appear using OpenAI's whisper package. |
This is possible because the compression ratio is one of two key thresholds that help decide if context should be carried forward into the next decoding cycle.
The code for filtering out these non-speaking tokens exists, but it's currently disabled by default. It definitely warrants further investigation. |
Works great with CoreML on MacBook Air M1. |
Is it correct that the ggml-large-encoder.mlmodelc.zip for large-v3 has not been uploaded yet? |
Hi, it's likely large-v3 has been uploaded: |
Thanks for your advice. Is it not necessary to update mlmodelc for v3? |
@solaoi you have to do it yourself with convert-whisper-to-coreml.py |
I haven't uploaded it. Would appreciate if somebody makes a PR with the updated large CoreML model to replace the old one and to rename the old one to "-v2" |
@ggerganov |
Thank you @solaoi ! |
maybe this issue can be closed? |
OpenAI released their
large-v3
whisper model: openai/whisper#1762It would be great for whisper.cpp to support it
The text was updated successfully, but these errors were encountered: