Add whisper.cpp (server) support to llamafile #517

cjpais · 2024-07-30T22:18:02Z

This PR adds whisper.cpp support to llamafile. This addresses #17 in part. Only the server binary has been ported in this PR.

Most of the work to support this was initially done on my fork of llamafile: whisperfile. This PR ports the code over, and structures it similarly to stable-diffusion.cpp support

The whisper.cpp code was taken from commit: 6739eb8

jart

Great work getting the ball rolling on whisperfile @cjpais!

I'm happy to take over the maintenance burden for you.

jart · 2024-07-31T10:43:10Z

Just pushed some improvements. Enjoy!

b3bdc62 main Make whisperfile 12% faster in GPU mode
0849f32 Get CUDA and Metal GPU working in whisperfile
94e9629 Add CLI program to whisperfile

jart · 2024-07-31T11:13:49Z

OK the good news is that whisperfile works outstandingly well with the "large" model on cpu.

However I'm going to disable GPU support by default because it isn't working reliably. Here's CUDA.

Here's Apple Metal:

I'm not sure what's wrong. It's possible the problem will solve itself the next time I synchronize with llama.cpp upstream. I also checked and it definitely wasn't my performance optimization in the last change. The tiny whisper-tiny.en-q5_1.bin model does seem to work great on GPU (you can pass the --gpu auto flag to try it). So maybe the issue is related to how whisper-medium.en.bin and whisper-large-v3.bin are encoded. Any ideas?

cjpais · 2024-07-31T14:00:23Z

Thank you @jart!! I must've missed porting some of the CUDA/Metal from whisperfile repo, appreciate you taking such quick care of it. Love the performance improvements on CPU as well!

It's possible could be something with the encoding, I've ran into some issues with the bin's before.. Also when I synced the whisper.cpp code I did not sanity check it (and I don't know if they are), so it may be an underlying issue? If you don't get to it first I'll take a closer look towards the end of the week/over the weekend

merge whisperfile into llamafile

dbaec38

github-actions bot added llama.cpp llamafile labels Jul 30, 2024

cjpais changed the title ~~Add whisper.cpp in llamafile~~ Add whisper.cpp (server) support to llamafile Jul 30, 2024

jart approved these changes Jul 31, 2024

View reviewed changes

jart merged commit fd891be into Mozilla-Ocho:main Jul 31, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add whisper.cpp (server) support to llamafile #517

Add whisper.cpp (server) support to llamafile #517

cjpais commented Jul 30, 2024 •

edited

Loading

jart left a comment

jart commented Jul 31, 2024

jart commented Jul 31, 2024

cjpais commented Jul 31, 2024

Add whisper.cpp (server) support to llamafile #517

Add whisper.cpp (server) support to llamafile #517

Conversation

cjpais commented Jul 30, 2024 • edited Loading

jart left a comment

Choose a reason for hiding this comment

jart commented Jul 31, 2024

jart commented Jul 31, 2024

cjpais commented Jul 31, 2024

cjpais commented Jul 30, 2024 •

edited

Loading