Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for large-v3 #1437

Closed
noe opened this issue Nov 6, 2023 · 32 comments
Closed

Support for large-v3 #1437

noe opened this issue Nov 6, 2023 · 32 comments
Labels
high priority Very important issue

Comments

@noe
Copy link

noe commented Nov 6, 2023

OpenAI released their large-v3 whisper model: openai/whisper#1762

It would be great for whisper.cpp to support it

@dataf3l
Copy link

dataf3l commented Nov 6, 2023

I came here to post this exact same issue, thank you @noe, from the looks of the commit, maybe you can try swap the model and see if that kinda works I guess.

@bobqianic bobqianic added the high priority Very important issue label Nov 6, 2023
@cspenn
Copy link

cspenn commented Nov 6, 2023

If the model does get revved, will it be in the new GGUF format?

@jordibruin
Copy link

I tried to convert the pt to ggml but ran into some issues. If anyone wants to try:

https://openaipublic.azureedge.net/main/whisper/models/e5b1a55b89c1367dacf97e3e19bfd829a01529dbfdeefa8caeb59b3f1b81dadb/large-v3.pt

@lanma
Copy link

lanma commented Nov 7, 2023

large-v3.pt download code:

import whisper
model = whisper.load_model("large-v3")

I tried using this script to convert the large-v3.pt to the ggml file, but it seems the output ggml model file is not correct.
python models/convert-pt-to-ggml.py ~/.cache/whisper/large-v3.pt ../whisper ./models/
(Output file: ./models/ggml-model.bin)
Any suggestions?

@jordibruin
Copy link

@lanma what is wrong with the model file for you? Do you get errors?

@lanma
Copy link

lanma commented Nov 7, 2023

test.wav same file results

Large-v2 ggml transcribe with zh language setting
main: processing './test.wav' (36096 samples, 2.3 sec), 4 threads, 1 processors, lang = zh, task = transcribe, timestamps = 1 ...
ggml_metal_add_buffer: allocated 'kv_self_1 ' buffer, size = 70.02 MB, ( 3629.22 / 21845.34)
[00:00:00.000 --> 00:00:02.000] 這是一個中文測試

Large-v3 ggml after convert, transcribe with zh language setting
system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | METAL = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | COREML = 0 | OPENVINO = 0 |
main: WARNING: model is not multilingual, ignoring language and translation options
main: processing './test.wav' (36096 samples, 2.3 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...
ggml_metal_add_buffer: allocated 'kv_self_1 ' buffer, size = 70.02 MB, ( 3629.77 / 21845.34)

Large-v3 ggml after convert, transcribe with en language setting
system_info: n_threads = 4 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | METAL = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | COREML = 0 | OPENVINO = 0 |
main: processing './test.wav' (36096 samples, 2.3 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...
ggml_metal_add_buffer: allocated 'kv_self_1 ' buffer, size = 70.02 MB, ( 3629.77 / 21845.34)
[00:00:00.040 --> 00:00:30.000] sifications 20 It

@bobqianic
Copy link
Collaborator

Hey, can you give this PR a spin and check if it's working alright? #1444

@piotr-sikora-v
Copy link

piotr-sikora-v commented Nov 7, 2023

Hi, I have problem with Polish language.
After change to new large model nothing is generated.
I run download and after that ./models/generate-coreml-model.sh

Also I see some strange warnign:

main: WARNING: model is not multilingual, ignoring language and translation options

Of corse standard openai-whisper works without problem on large-v3 with same file and Polish language.

@bobqianic
Copy link
Collaborator

bobqianic commented Nov 7, 2023

Hi, I have problem with Polish language. After change to new large model nothing is generated. I run download and after that ./models/generate-coreml-model.sh

Also I see some strange warnign:

main: WARNING: model is not multilingual, ignoring language and translation options

Of corse standard openai-whisper works without problem on large-v3 with same file and Polish language.

Please make sure to download the latest version from the master branch and compile it yourself, as the version of whisper.cpp you're currently using is outdated.

I can provide a Windows binary for testing purposes right here:
whisper.cpp-39a240b-win64-openblas.zip

@lanma
Copy link

lanma commented Nov 8, 2023

After updating my whisper.cpp to the latest version, everything is working well! Thank you, everyone.

@piotr-sikora-v
Copy link

piotr-sikora-v commented Nov 8, 2023

Hi, I have problem with Polish language. After change to new large model nothing is generated. I run download and after that ./models/generate-coreml-model.sh
Also I see some strange warnign:

main: WARNING: model is not multilingual, ignoring language and translation options

Of corse standard openai-whisper works without problem on large-v3 with same file and Polish language.

Please make sure to download the latest version from the master branch and compile it yourself, as the version of whisper.cpp you're currently using is outdated.

I can provide a Windows binary for testing purposes right here: whisper.cpp-39a240b-win64-openblas.zip

ok, after upgrade now I have error with build CoreML model:

# ./models/generate-coreml-model.sh large
ModelDimensions(n_mels=128, n_audio_ctx=1500, n_audio_state=1280, n_audio_head=20, n_audio_layer=32, n_vocab=51866, n_text_ctx=448, n_text_state=1280, n_text_head=20, n_text_layer=32)
Traceback (most recent call last):
  File "/xxxxx/whisper.cpp/models/convert-whisper-to-coreml.py", line 323, in <module>
    encoder = convert_encoder(hparams, encoder, quantize=args.quantize)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/xxxxx/whisper.cpp/models/convert-whisper-to-coreml.py", line 257, in convert_encoder
    traced_model = torch.jit.trace(model, input_data)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/jit/_trace.py", line 798, in trace
    return trace_module(
           ^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/jit/_trace.py", line 1065, in trace_module
    module._c._create_method_from_trace(
  File "/opt/homebrew/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1508, in _slow_forward
    result = self.forward(*input, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/whisper/model.py", line 162, in forward
    x = F.gelu(self.conv1(x))
               ^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1508, in _slow_forward
    result = self.forward(*input, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 310, in forward
    return self._conv_forward(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/whisper/model.py", line 48, in _conv_forward
    return super()._conv_forward(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 306, in _conv_forward
    return F.conv1d(input, weight, bias, self.stride,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Given groups=1, weight of size [1280, 128, 3], expected input[1, 80, 3000] to have 128 channels, but got 80 channels instead
coremlc: error: Model does not exist at models/coreml-encoder-large.mlpackage -- file:///xxxxx/whisper.cpp/
mv: rename models/coreml-encoder-large.mlmodelc to models/ggml-large-encoder.mlmodelc: No such file or directory

@bobqianic
Copy link
Collaborator

Hi, I have problem with Polish language. After change to new large model nothing is generated. I run download and after that ./models/generate-coreml-model.sh
Also I see some strange warnign:

main: WARNING: model is not multilingual, ignoring language and translation options

Of corse standard openai-whisper works without problem on large-v3 with same file and Polish language.

Please make sure to download the latest version from the master branch and compile it yourself, as the version of whisper.cpp you're currently using is outdated.
I can provide a Windows binary for testing purposes right here: whisper.cpp-39a240b-win64-openblas.zip

ok, after upgrade now I have error with build CoreML model:

# ./models/generate-coreml-model.sh large
ModelDimensions(n_mels=128, n_audio_ctx=1500, n_audio_state=1280, n_audio_head=20, n_audio_layer=32, n_vocab=51866, n_text_ctx=448, n_text_state=1280, n_text_head=20, n_text_layer=32)
Traceback (most recent call last):
  File "/xxxxx/whisper.cpp/models/convert-whisper-to-coreml.py", line 323, in <module>
    encoder = convert_encoder(hparams, encoder, quantize=args.quantize)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/xxxxx/whisper.cpp/models/convert-whisper-to-coreml.py", line 257, in convert_encoder
    traced_model = torch.jit.trace(model, input_data)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/jit/_trace.py", line 798, in trace
    return trace_module(
           ^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/jit/_trace.py", line 1065, in trace_module
    module._c._create_method_from_trace(
  File "/opt/homebrew/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1508, in _slow_forward
    result = self.forward(*input, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/whisper/model.py", line 162, in forward
    x = F.gelu(self.conv1(x))
               ^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1508, in _slow_forward
    result = self.forward(*input, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 310, in forward
    return self._conv_forward(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/whisper/model.py", line 48, in _conv_forward
    return super()._conv_forward(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 306, in _conv_forward
    return F.conv1d(input, weight, bias, self.stride,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Given groups=1, weight of size [1280, 128, 3], expected input[1, 80, 3000] to have 128 channels, but got 80 channels instead
coremlc: error: Model does not exist at models/coreml-encoder-large.mlpackage -- file:///xxxxx/whisper.cpp/
mv: rename models/coreml-encoder-large.mlmodelc to models/ggml-large-encoder.mlmodelc: No such file or directory

Yes, there is a bug in the script. The input dimensions should not be hardcoded.

@piotr-sikora-v
Copy link

piotr-sikora-v commented Nov 8, 2023

Hi, I have problem with Polish language. After change to new large model nothing is generated. I run download and after that `./models/generate-coreml-model.sh
ok, after upgrade now I have error with build CoreML model:

# ./models/generate-coreml-model.sh large
ModelDimensions(n_mels=128, n_audio_ctx=1500, n_audio_state=1280, n_audio_head=20, n_audio_layer=32, n_vocab=51866, n_text_ctx=448, n_text_state=1280, n_text_head=20, n_text_layer=32)
Traceback (most recent call last):
.....
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Given groups=1, weight of size [1280, 128, 3], expected input[1, 80, 3000] to have 128 channels, but got 80 channels instead
coremlc: error: Model does not exist at models/coreml-encoder-large.mlpackage -- file:///xxxxx/whisper.cpp/
mv: rename models/coreml-encoder-large.mlmodelc to models/ggml-large-encoder.mlmodelc: No such file or directory

Yes, there is a bug in the script. The input dimensions should not be hardcoded.

I tried to change it from 80 to 128 in this file, and after that now I have error here:

/opt/homebrew/lib/python3.11/site-packages/whisper/model.py:166: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert x.shape[1:] == self.positional_embedding.shape, "incorrect audio shape"

file is generated, but transcription not working :/

any sugestion?

@jxy
Copy link
Contributor

jxy commented Nov 8, 2023

I have coreml working with large-v3 now, #1458

$ ./main -m models/ggml-large.bin -f samples/jfk.wav            
whisper_init_from_file_with_params_no_state: loading model from 'models/ggml-large.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51866
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 32
whisper_model_load: n_mels        = 128
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 5 (large v3)
whisper_model_load: adding 1609 extra tokens
whisper_model_load: n_langs       = 100
whisper_model_load: model ctx     = 2951.63 MB
whisper_model_load: model size    = 2951.01 MB
whisper_init_state: kv self size  =   70.00 MB
whisper_init_state: kv cross size =  234.38 MB
whisper_init_state: loading Core ML model from 'models/ggml-large-encoder.mlmodelc'
whisper_init_state: first run on a device may take a while ...
whisper_init_state: Core ML model loaded
whisper_init_state: compute buffer (conv)   =   10.35 MB
whisper_init_state: compute buffer (cross)  =    8.89 MB
whisper_init_state: compute buffer (decode) =   59.40 MB
whisper_init_state: Metal context initialized
whisper_init_state: max tensor size =   126.63 MB

system_info: n_threads = 4 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | METAL = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | COREML = 1 | OPENVINO = 0 | 

main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:03.000]   And so my fellow Americans,
[00:00:03.000 --> 00:00:08.000]   ask not what your country can do for you,
[00:00:08.000 --> 00:00:11.000]   ask what you can do for your country.


whisper_print_timings:     load time =  1287.51 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =     5.79 ms
whisper_print_timings:   sample time =     9.90 ms /    31 runs (    0.32 ms per run)
whisper_print_timings:   encode time =  1238.65 ms /     1 runs ( 1238.65 ms per run)
whisper_print_timings:   decode time =   861.43 ms /    30 runs (   28.71 ms per run)
whisper_print_timings:   prompt time =    45.03 ms /     1 runs (   45.03 ms per run)
whisper_print_timings:    total time =  6412.64 ms

@piotr-sikora-v
Copy link

@jxy I can configrm... It works!

@Ajaja
Copy link

Ajaja commented Nov 8, 2023

In my test it begins to repeat the same phrase after ~17 minutes of transcribing.
I tried to use whisper-bin-x64 and whisper-cublas-bin-x64 from https://github.com/ggerganov/whisper.cpp/actions/runs/6800894353
Output: out.txt
(audio from https://www.realm.fm/episodes/marigold-breach-podcast-s1-e3/reader?media_type=audio)

@jxy
Copy link
Contributor

jxy commented Nov 8, 2023

I gave the large model a 25 min audio. It broke down at about 16 to 17 min mark, but it got back to track at around 18 min. Tried again and it broke down at 8 min mark.

It worked find with large-v2 and medium.en.

So I guess something is still wrong.

@bobqianic
Copy link
Collaborator

So I guess something is still wrong.

Do you notice the same phenomenon with OpenAI's Whisper?

@jxy
Copy link
Contributor

jxy commented Nov 9, 2023

With OpenAI's Whisper, there are occasional repetitions of the previous sentence during a gap of silence in the audio, but it does not break down into endless repetitions.

@ggerganov
Copy link
Owner

With OpenAI's Whisper, there are occasional repetitions of the previous sentence during a gap of silence in the audio, but it does not break down into endless repetitions.

OpenAI's Whisper uses different strategy to overcome repetitions based on compressing the transcript, so it probably works better than whisper.cpp's entropy-based strategy in this case.

To reduce repetitions, you can either try to increase the number of beams and / or the entropy threshold:

-bs 5 -bo 5 -et 2.8

Or disable the context (not recommended, since we lose other nice qualities of the model):

-mc 0

However, I find it worrying that even the OG implementation repeats more and also sometimes produces invalid characters (#1444 (comment))

@Ajaja
Copy link

Ajaja commented Nov 9, 2023

To reduce repetitions, you can either try to increase the number of beams and / or the entropy threshold:
-bs 5 -bo 5 -et 2.8

Didn't help in my case. Went to endless repetition of two lines after 12 minutes.

Or disable the context (not recommended, since we lose other nice qualities of the model):
-mc 0

Yes, this fixed the repetition. But also added hallucinated I'll see you next time at the end of the transcription. It was creepy )

[00:31:14.060 --> 00:31:15.660]   Cover art by Kendall Thomas.
[00:31:15.660 --> 00:31:18.400]   Executive in charge for Realm, Mary Azadulihi.
[00:31:19.000 --> 00:31:25.120]   Find more shows like Marigold Breach by following Realm on Apple Podcasts, Spotify, or at realm.fm.
[00:31:25.120 --> 00:31:28.120]   ♪ ♪
[00:31:28.120 --> 00:31:28.620]   you
[00:31:28.620 --> 00:31:30.680]   you
[00:31:30.680 --> 00:31:32.740]   you
[00:31:32.740 --> 00:31:34.360]   I'll see you next time.

output_srt: saving output to '01_v3.srt'

whisper_print_timings:     load time =  2219.45 ms
whisper_print_timings:     fallbacks =   5 p /   0 h
whisper_print_timings:      mel time =  1507.82 ms
whisper_print_timings:   sample time =  5514.81 ms /  6881 runs (    0.80 ms per run)
whisper_print_timings:   encode time = 431008.91 ms /    75 runs ( 5746.79 ms per run)
whisper_print_timings:   decode time = 381667.41 ms /  6792 runs (   56.19 ms per run)
whisper_print_timings:   prompt time =  7772.44 ms /    82 runs (   94.79 ms per run)
whisper_print_timings:    total time = 830079.44 ms

@jxy
Copy link
Contributor

jxy commented Nov 9, 2023

Would calling zlib to find the compress ratio of the transcript be the solution of our issues here?

In addition, I also see those "♪ ♪". I thought the code explicitly filtered out these tokens. It doesn't appear using OpenAI's whisper package.

@bobqianic
Copy link
Collaborator

bobqianic commented Nov 9, 2023

Would calling zlib to find the compress ratio of the transcript be the solution of our issues here?

This is possible because the compression ratio is one of two key thresholds that help decide if context should be carried forward into the next decoding cycle.

In addition, I also see those "♪ ♪". I thought the code explicitly filtered out these tokens. It doesn't appear using OpenAI's whisper package.

The code for filtering out these non-speaking tokens exists, but it's currently disabled by default. It definitely warrants further investigation.

@matijagrcic
Copy link

Works great with CoreML on MacBook Air M1.

@solaoi
Copy link

solaoi commented Nov 11, 2023

Is it correct that the ggml-large-encoder.mlmodelc.zip for large-v3 has not been uploaded yet?
https://huggingface.co/ggerganov/whisper.cpp/tree/main

@cvhien
Copy link

cvhien commented Nov 11, 2023

Is it correct that the ggml-large-encoder.mlmodelc.zip for large-v3 has not been uploaded yet? https://huggingface.co/ggerganov/whisper.cpp/tree/main

Hi, it's likely large-v3 has been uploaded:
"NOTE: re-download ggml-large.bin to get the v3 version,
ggml-large.bin is the new v3 model"
#1444 (comment)

@solaoi
Copy link

solaoi commented Nov 12, 2023

Is it correct that the ggml-large-encoder.mlmodelc.zip for large-v3 has not been uploaded yet? https://huggingface.co/ggerganov/whisper.cpp/tree/main

Hi, it's likely large-v3 has been uploaded: "NOTE: re-download ggml-large.bin to get the v3 version, ggml-large.bin is the new v3 model" #1444 (comment)

Thanks for your advice.
There is no updated ggml-large-encoder.mlmodelc.zip for this commit.
https://huggingface.co/ggerganov/whisper.cpp/commit/bf8b606c2fcd9173605cdf6bd2ac8a75a8141b6c

Is it not necessary to update mlmodelc for v3?

@cspenn
Copy link

cspenn commented Nov 12, 2023

@solaoi you have to do it yourself with convert-whisper-to-coreml.py

@ggerganov
Copy link
Owner

ggerganov commented Nov 13, 2023

Is it correct that the ggml-large-encoder.mlmodelc.zip for large-v3 has not been uploaded yet? https://huggingface.co/ggerganov/whisper.cpp/tree/main

I haven't uploaded it. Would appreciate if somebody makes a PR with the updated large CoreML model to replace the old one and to rename the old one to "-v2"

@solaoi
Copy link

solaoi commented Nov 25, 2023

@ggerganov
I have uploaded the updated large-v3 CoreML model
https://huggingface.co/ggerganov/whisper.cpp/discussions/14

@ggerganov
Copy link
Owner

Thank you @solaoi !

@khimaros
Copy link

maybe this issue can be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority Very important issue
Projects
None yet
Development

No branches or pull requests