Inferencing result different from original whisper with GPU even when using same model #256

ssteo · 2022-12-10T19:20:06Z

Is there any parameter that needs to be added into the implementation like in https://github.com/openai/whisper/tree/main/whisper/assets/multilingual ?

I've tested all models and found the inferenced results are different compared to using original whisper with GPU.
I'm wondering what is missing in my setup or there is some difference in the implementation of this project?

misutoneko · 2022-12-11T00:38:25Z

I don't know this in detail but it's a different implementation, found this bit from the original announcement:

Just a note that the whisper.cpp implementation currently only supports the greedy sampling strategy, so to make a fair comparison with PyTorch, you would need to disable the beam search when running it.

(That's from October though, so I'm not sure if it still applies...things move fast)
The original whisper itself gives you different results depending on options (beam size etc.) and apparently there is a possibility of nondeterminism in the play also.

RYucel · 2022-12-11T17:47:24Z

I also found differences on WER calculation results between large models of PyTorch and whisper.cpp. Whisper.cpp got worse WER score for my tests on large model (ie.. 12% vs 18% WER). Is there any way to bring whisper.cpp with same level accuracy by settings? Naive question but I am learning recently.

ggerganov · 2022-12-11T18:39:38Z

The decoding strategy in whisper.cpp is not exactly the same as the one in the original OpenAI repo. Differences can be expected and likely whisper.cpp is inferior atm. In any case, if you want to make fair comparisons between the two, make sure to run the PyTorch version using the Greedy decoder as explained in the README.

@RYucel
Can you give a tutorial for computing WER? Are you running the PyTorch implementation with the Greedy decoder?

bmilde · 2023-01-03T16:35:40Z

I've encountered this as well with the whisper commandline vs. using whisper from a python script (both have different defaults), see here:

openai/whisper#591

The default parameters that the python whisper command line tool uses are:

result = model.transcribe("audio.mp3", language=language, task='transcribe', temperature=(0.0, 0.2, 0.4, 0.6, 0.8, 1.0), best_of=5, beam_size=5, suppress_tokens="-1", condition_on_previous_text=True, fp16=True, compression_ratio_threshold=2.4, logprob_threshold=-1., no_speech_threshold=0.6)

Biggest difference is that the python whisper decoder does beamsearch, conditions the segments on the preceding ones, temperature back-off when a compression ratio signals likely faulty output (see the example in the whisper discussion link). whisper.cpp already mentions in doesn't do beamsearch, my guess it doesn't do any of the other stuff either.

You can also try to check if the outputs are more similar if you set best_of=1, beam_size=1 or best_of=None, beam_size=None, basically making Python whisper do greedy decoding too.

ggerganov · 2023-01-15T14:00:00Z

With the latest version the whisper.cpp results should be better and hopefully closer to the Python implementation.

By default, the main example corresponds to:

temperature=(0.0, 0.2, 0.4, 0.6, 0.8, 1.0)
best_of=5
beam_size=None
suppress_tokens="-1",
condition_on_previous_text=True
fp16=True
compression_ratio_threshold=2.4
logprob_threshold=-1.

You can enable beamsearch via--beam_size 5 - it is disabled by default.

o0101 · 2023-02-10T03:55:36Z

Hi @ggerganov You have done something phenomenal with this work! Sorry to comment on a closed issue but I was wondering if there is any switch to set --condition_on_previous_text to False?

ggerganov · 2023-02-14T17:15:59Z

@crisdosyago
Passing --max-context 0 to main should be equivalent to --condition_on_previous_text False

o0101 · 2023-02-15T01:48:52Z

Thank you, sir!

ggerganov added the question Further information is requested label Dec 11, 2022

ggerganov closed this as completed Jan 15, 2023

meakbiyik mentioned this issue Feb 28, 2023

Strange behavior of "stream" example (Linux, amd64) #354

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inferencing result different from original whisper with GPU even when using same model #256

Inferencing result different from original whisper with GPU even when using same model #256

ssteo commented Dec 10, 2022

misutoneko commented Dec 11, 2022 •

edited

Loading

RYucel commented Dec 11, 2022

ggerganov commented Dec 11, 2022

bmilde commented Jan 3, 2023

ggerganov commented Jan 15, 2023

o0101 commented Feb 10, 2023

ggerganov commented Feb 14, 2023

o0101 commented Feb 15, 2023

Inferencing result different from original whisper with GPU even when using same model #256

Inferencing result different from original whisper with GPU even when using same model #256

Comments

ssteo commented Dec 10, 2022

misutoneko commented Dec 11, 2022 • edited Loading

RYucel commented Dec 11, 2022

ggerganov commented Dec 11, 2022

bmilde commented Jan 3, 2023

ggerganov commented Jan 15, 2023

o0101 commented Feb 10, 2023

ggerganov commented Feb 14, 2023

o0101 commented Feb 15, 2023

misutoneko commented Dec 11, 2022 •

edited

Loading