bug: significantly lower performance compared to original whisper.cpp #117

shadowusr · 2023-06-05T14:27:11Z

Describe the bug

I installed the latest version via:

pip install git+https://github.com/aarnphm/whispercpp.git -vv

Then I have this 1 minute long wav file.

Here's the output of the original whisper.cpp command:

./main -m models/ggml-small.bin -f out.wav --language auto --max-len 1
whisper_init_from_file_no_state: loading model from 'models/ggml-small.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 768
whisper_model_load: n_audio_head  = 12
whisper_model_load: n_audio_layer = 12
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 768
whisper_model_load: n_text_head   = 12
whisper_model_load: n_text_layer  = 12
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 3
whisper_model_load: mem required  =  743.00 MB (+   16.00 MB per decoder)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx     =  464.68 MB
whisper_model_load: model size    =  464.44 MB
whisper_init_state: kv self size  =   15.75 MB
whisper_init_state: kv cross size =   52.73 MB

system_info: n_threads = 4 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | COREML = 0 | 

main: processing 'out.wav' (958952 samples, 59.9 sec), 4 threads, 1 processors, lang = auto, task = transcribe, timestamps = 1 ...

whisper_full_with_state: auto-detected language: ru (p = 0.993210)

# recognition results go here ...

whisper_print_timings:     load time =   571.00 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =   310.98 ms
whisper_print_timings:   sample time =   378.25 ms /   426 runs (    0.89 ms per run)
whisper_print_timings:   encode time = 11926.32 ms /     4 runs ( 2981.58 ms per run)
whisper_print_timings:   decode time =  9821.29 ms /   425 runs (   23.11 ms per run)
whisper_print_timings:    total time = 23272.83 ms

Total execution time is 23 seconds.

Here's my python code which uses this library:

import time
from whispercpp import Whisper

start = time.time()

w = Whisper.from_pretrained(model_name="/whispercpp/models/ggml-small.bin")

w.params.with_language("auto")

print(w.transcribe_from_file("out.wav"))

end = time.time()
print(end - start)

And here's the output on the same file:

whisper_init_from_file_no_state: loading model from '/whispercpp/models/ggml-small.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 768
whisper_model_load: n_audio_head  = 12
whisper_model_load: n_audio_layer = 12
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 768
whisper_model_load: n_text_head   = 12
whisper_model_load: n_text_layer  = 12
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 3
whisper_model_load: mem required  =  608.00 MB (+   16.00 MB per decoder)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx     =  464.56 MB
whisper_model_load: model size    =  464.44 MB
whisper_init_state: kv self size  =   15.75 MB
whisper_init_state: kv cross size =   52.73 MB
whisper_full_with_state: auto-detected language: ru (p = 0.993206)

# recognition results go here...

183.6768798828125

Total execution time is 183s.

Difference is almost 9x times.

To reproduce

No response

Expected behavior

I'd expect the performance to be the same as can be seen in original whisper.cpp

Environment

Macbook pro 16, 2,6 GHz 6-Core Intel Core i7; 32 GB RAM
Python 3.10
Latest versions of whisper.cpp and this library as of 5th June 2023.

The text was updated successfully, but these errors were encountered:

AzizCode92 · 2023-08-21T17:37:32Z

I have also experienced a very slow transcription.
Did you come up with a solution to this problem?
Thanks.

shadowusr added the bug Something isn't working label Jun 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: significantly lower performance compared to original whisper.cpp #117

bug: significantly lower performance compared to original whisper.cpp #117

shadowusr commented Jun 5, 2023

AzizCode92 commented Aug 21, 2023

bug: significantly lower performance compared to original whisper.cpp #117

bug: significantly lower performance compared to original whisper.cpp #117

Comments

shadowusr commented Jun 5, 2023

Describe the bug

To reproduce

Expected behavior

Environment

AzizCode92 commented Aug 21, 2023