-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for whisper.cpp #17
Comments
I agree, Whisper is awesome. I used to use it at the command line (which was very slow). I now use this project, and the inference times are 5-10x faster: https://github.com/jhj0517/Whisper-WebUI |
I would also love speech input support for this, not only because it would be a really cool feature, but also because I sometimes get a bit of RSI so anything to help reduce the amount of typing needed is very helpful. |
yeah this would be nice to have in the llama server, llamafile was the only way I have figured out to run this things I've been hearing about for a year! |
Speech input is a big feature in my use case, I do it now with GPT-4 on iPhone but doing the same with llamafile's server would be fantastic. What are the main blockers? |
Please, very interested in this use-case! |
Devil's advocate: it's not very difficult to run whisper separately and pipe any recognised sentences into Llamafile? I'm literally doing that right now, for example. It's also relatively easy to do in the browser. What would be the benefit? Would this integration allow the LLM to start processing detected words earlier? |
It would have the same benefit that llamafile does. You wouldn't have to compile the software yourself. |
Hi. Is there any update regarding this request? |
I was able to build whisper.cpp using cosmocc with very few modifications. diff --git a/Makefile b/Makefile
index 93c89cd..b3a89d7 100644
--- a/Makefile
+++ b/Makefile
@@ -39,7 +39,7 @@ endif
#
CFLAGS = -I. -O3 -DNDEBUG -std=c11 -fPIC
-CXXFLAGS = -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC
+CXXFLAGS = -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -fexceptions
LDFLAGS =
ifdef MACOSX_DEPLOYMENT_TARGET
@@ -134,38 +134,38 @@ ifeq ($(UNAME_M),$(filter $(UNAME_M),x86_64 i686 amd64))
ifdef CPUINFO_CMD
AVX_M := $(shell $(CPUINFO_CMD) | grep -iwE 'AVX|AVX1.0')
ifneq (,$(AVX_M))
- CFLAGS += -mavx
- CXXFLAGS += -mavx
+ CFLAGS += -Xx86_64-mavx
+ CXXFLAGS += -Xx86_64-mavx
endif
AVX2_M := $(shell $(CPUINFO_CMD) | grep -iw 'AVX2')
ifneq (,$(AVX2_M))
- CFLAGS += -mavx2
- CXXFLAGS += -mavx2
+ CFLAGS += -Xx86_64-mavx2
+ CXXFLAGS += -Xx86_64-mavx2
endif
FMA_M := $(shell $(CPUINFO_CMD) | grep -iw 'FMA')
ifneq (,$(FMA_M))
- CFLAGS += -mfma
- CXXFLAGS += -mfma
+ CFLAGS += -Xx86_64-mfma
+ CXXFLAGS += -Xx86_64-mfma
endif
F16C_M := $(shell $(CPUINFO_CMD) | grep -iw 'F16C')
ifneq (,$(F16C_M))
- CFLAGS += -mf16c
- CXXFLAGS += -mf16c
+ CFLAGS += -Xx86_64-mf16c
+ CXXFLAGS += -Xx86_64-mf16c
endif
SSE3_M := $(shell $(CPUINFO_CMD) | grep -iwE 'PNI|SSE3')
ifneq (,$(SSE3_M))
- CFLAGS += -msse3
- CXXFLAGS += -msse3
+ CFLAGS += -Xx86_64-msse3
+ CXXFLAGS += -Xx86_64-msse3
endif
SSSE3_M := $(shell $(CPUINFO_CMD) | grep -iw 'SSSE3')
ifneq (,$(SSSE3_M))
- CFLAGS += -mssse3
- CXXFLAGS += -mssse3
+ CFLAGS += -Xx86_64-mssse3
+ CXXFLAGS += -Xx86_64-mssse3
endif
endif
endif
diff --git a/ggml.c b/ggml.c
index 4ee2c5e..521eafe 100644
--- a/ggml.c
+++ b/ggml.c
@@ -24,7 +24,7 @@
#include <stdarg.h>
#include <signal.h>
#if defined(__gnu_linux__)
-#include <syscall.h>
+#include <sys/syscall.h>
#endif
#ifdef GGML_USE_METAL
@@ -2069,6 +2069,8 @@ void ggml_numa_init(enum ggml_numa_strategy numa_flag) {
int getcpu_ret = 0;
#if __GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ > 28)
getcpu_ret = getcpu(¤t_cpu, &g_state.numa.current_node);
+#elif defined(__COSMOPOLITAN__)
+ current_cpu = sched_getcpu(), getcpu_ret = 0;
#else
// old glibc doesn't have a wrapper for this call. Fall back on direct syscall
getcpu_ret = syscall(SYS_getcpu,¤t_cpu,&g_state.numa.current_node); I made a couple changes to cosmopolitan upstream that'll be incorporated in the next release for making it easier to build. More work would need to be done to do it as well as llamafile packages llama.cpp. But until then, you have this: |
Wow, thanks @jart! That's amazing! $ whisperfile -m ggml-model-q5_0.bin samples/jfk.wav
whisper_init_from_file_with_params_no_state: loading model from 'ggml-model-q5_0.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51866
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1280
whisper_model_load: n_text_head = 20
whisper_model_load: n_text_layer = 32
whisper_model_load: n_mels = 128
whisper_model_load: ftype = 8
whisper_model_load: qntvr = 2
whisper_model_load: type = 5 (large v3)
whisper_model_load: adding 1609 extra tokens
whisper_model_load: n_langs = 100
whisper_model_load: CPU total size = 1080.47 MB
whisper_model_load: model size = 1080.47 MB
whisper_init_state: kv self size = 220.20 MB
whisper_init_state: kv cross size = 245.76 MB
whisper_init_state: compute buffer (conv) = 36.26 MB
whisper_init_state: compute buffer (encode) = 926.66 MB
whisper_init_state: compute buffer (cross) = 9.38 MB
whisper_init_state: compute buffer (decode) = 209.26 MB
system_info: n_threads = 4 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 |
main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...
[00:00:00.060 --> 00:00:07.500] And so, my dear Americans, do not ask what your country can do for you.
[00:00:07.500 --> 00:00:11.000] Ask what you can do for your country.
whisper_print_timings: load time = 1281.10 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 41.95 ms
whisper_print_timings: sample time = 102.85 ms / 159 runs ( 0.65 ms per run)
whisper_print_timings: encode time = 29479.98 ms / 1 runs (29479.98 ms per run)
whisper_print_timings: decode time = 38.76 ms / 1 runs ( 38.76 ms per run)
whisper_print_timings: batchd time = 3710.61 ms / 156 runs ( 23.79 ms per run)
whisper_print_timings: prompt time = 0.00 ms / 1 runs ( 0.00 ms per run)
whisper_print_timings: total time = 34662.24 ms Any instructions on how to package it together with a GGML model? Thanks again! |
I just tried to compile it myself, to see it I was able to also get the whisper.cpp:2575:27: error: exponent has no digits
2575 | double theta = (2*M_PI*i)/SIN_COS_N_COUNT;
| ^~~~
whisper.cpp:2672:42: error: exponent has no digits
2672 | output[i] = 0.5*(1.0 - cosf((2.0*M_PI*i)/(length + offset)));
| ^~~~ I run it like this: $ make CC=bin/cosmocc CXX=bin/cosmoc++ stream |
@versae in your cosmocc toolchain just change #define M_E 2.7182818284590452354 /* 𝑒 */
#define M_LOG2E 1.4426950408889634074 /* log₂𝑒 */
#define M_LOG10E 0.43429448190325182765 /* log₁₀𝑒 */
#define M_LN2 0.69314718055994530942 /* logₑ2 */
#define M_LN10 2.30258509299404568402 /* logₑ10 */
#define M_PI 3.14159265358979323846 /* pi */
#define M_PI_2 1.57079632679489661923 /* pi/2 */
#define M_PI_4 0.78539816339744830962 /* pi/4 */
#define M_1_PI 0.31830988618379067154 /* 1/pi */
#define M_2_PI 0.63661977236758134308 /* 2/pi */
#define M_2_SQRTPI 1.12837916709551257390 /* 2/sqrt(pi) */
#define M_SQRT2 1.41421356237309504880 /* sqrt(2) */
#define M_SQRT1_2 0.70710678118654752440 /* 1/sqrt(2) */ This will ship in the next cosmocc release. |
After some tweaking, I was able to compile my own cosmocc and then use it to compile /usr/include/SDL2/SDL_config.h:4:10: fatal error: SDL2/_real_SDL_config.h: No such file or directory
4 | #include <SDL2/_real_SDL_config.h>
| ^~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make: *** [Makefile:402: stream] Error 1 I'll keep investigating as this could easily be some rookie mistake on my side. Would it make sense to create a |
I would suggest calling it Whisperfile :) |
so i have been using a more generic method to get voice in/out it works flawless my issue has been getting the model to load at a decent speed i have many variation of this code idk if this one is broken or not but i know the voice in/out works flawlessly edit: also its in python so not sure if it helps but even if it does not sometimes simplicity is best text to speech can go a long way
|
@jart I am in progress on getting a version of whisper.cpp built with llamafile, specifically the server example The executable itself is working and seems to be compiling properly for CUDA. However I would love some help with the file loading from within the zipaligned archive. If you could provide some guidance on what needs to be done in order to implement this portion that would be great. I have replaced the I am currently using the files |
If you've already discovered llamafile/llamafile.c then I'm not sure what other high level guidance I can offer you. |
Thanks @jart, that was all I needed. My C skills are a bit rusty so it was great to know I wasn't missing anything obvious, instead I was just forgetting some C basics For the time being I've forked the llamafile into: https://github.com/cjpais/whisperfile If it makes sense to integrate directly into llamafile, I am happy to clean up the code and submit a PR. If this is the case just let me know how you would like the dirs to be structured |
Why not just try to integrate/cosmopolitan-ize talk-llama into llamafile? Didn’t he already do all the heavy-lifting around this perhaps? |
it can also be done, probably fairly easy to do in the whisperfile repo. I needed |
I believe this issue can be closed now that we have whisperfile |
The whisperfile project @cjpais posted earlier is now been made an official Mozilla project. See the whisper.cpp/ folder of the llamafile codebase. Releases have been published to https://huggingface.co/Mozilla/whisperfile so enjoy everyone! |
Any chance of adding support for whisper.cpp? I know whisper.cpp is still stuck with the GGML format instead of GGUF, but it would be great to have portable whisper binaries that just work.
The text was updated successfully, but these errors were encountered: