Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inferencing result different from original whisper with GPU even when using same model #256

Closed
ssteo opened this issue Dec 10, 2022 · 8 comments
Labels
question Further information is requested

Comments

@ssteo
Copy link

ssteo commented Dec 10, 2022

Is there any parameter that needs to be added into the implementation like in https://github.com/openai/whisper/tree/main/whisper/assets/multilingual ?

I've tested all models and found the inferenced results are different compared to using original whisper with GPU.
I'm wondering what is missing in my setup or there is some difference in the implementation of this project?

@misutoneko
Copy link

misutoneko commented Dec 11, 2022

I don't know this in detail but it's a different implementation, found this bit from the original announcement:

Just a note that the whisper.cpp implementation currently only supports the greedy sampling strategy, so to make a fair comparison with PyTorch, you would need to disable the beam search when running it.

(That's from October though, so I'm not sure if it still applies...things move fast)
The original whisper itself gives you different results depending on options (beam size etc.) and apparently there is a possibility of nondeterminism in the play also.

@RYucel
Copy link

RYucel commented Dec 11, 2022

I also found differences on WER calculation results between large models of PyTorch and whisper.cpp. Whisper.cpp got worse WER score for my tests on large model (ie.. 12% vs 18% WER). Is there any way to bring whisper.cpp with same level accuracy by settings? Naive question but I am learning recently.

@ggerganov ggerganov added the question Further information is requested label Dec 11, 2022
@ggerganov
Copy link
Owner

The decoding strategy in whisper.cpp is not exactly the same as the one in the original OpenAI repo. Differences can be expected and likely whisper.cpp is inferior atm. In any case, if you want to make fair comparisons between the two, make sure to run the PyTorch version using the Greedy decoder as explained in the README.

@RYucel
Can you give a tutorial for computing WER? Are you running the PyTorch implementation with the Greedy decoder?

@bmilde
Copy link

bmilde commented Jan 3, 2023

I've encountered this as well with the whisper commandline vs. using whisper from a python script (both have different defaults), see here:

openai/whisper#591

The default parameters that the python whisper command line tool uses are:

result = model.transcribe("audio.mp3", language=language, task='transcribe', temperature=(0.0, 0.2, 0.4, 0.6, 0.8, 1.0), best_of=5, beam_size=5, suppress_tokens="-1", condition_on_previous_text=True, fp16=True, compression_ratio_threshold=2.4, logprob_threshold=-1., no_speech_threshold=0.6)

Biggest difference is that the python whisper decoder does beamsearch, conditions the segments on the preceding ones, temperature back-off when a compression ratio signals likely faulty output (see the example in the whisper discussion link). whisper.cpp already mentions in doesn't do beamsearch, my guess it doesn't do any of the other stuff either.

You can also try to check if the outputs are more similar if you set best_of=1, beam_size=1 or best_of=None, beam_size=None, basically making Python whisper do greedy decoding too.

@ggerganov
Copy link
Owner

With the latest version the whisper.cpp results should be better and hopefully closer to the Python implementation.

By default, the main example corresponds to:

  • temperature=(0.0, 0.2, 0.4, 0.6, 0.8, 1.0)
  • best_of=5
  • beam_size=None
  • suppress_tokens="-1",
  • condition_on_previous_text=True
  • fp16=True
  • compression_ratio_threshold=2.4
  • logprob_threshold=-1.

You can enable beamsearch via--beam_size 5 - it is disabled by default.

@o0101
Copy link

o0101 commented Feb 10, 2023

Hi @ggerganov You have done something phenomenal with this work! Sorry to comment on a closed issue but I was wondering if there is any switch to set --condition_on_previous_text to False?

@ggerganov
Copy link
Owner

@crisdosyago
Passing --max-context 0 to main should be equivalent to --condition_on_previous_text False

@o0101
Copy link

o0101 commented Feb 15, 2023

Thank you, sir!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants