Support for whisper.cpp #17

versae · 2023-11-30T15:05:03Z

Any chance of adding support for whisper.cpp? I know whisper.cpp is still stuck with the GGML format instead of GGUF, but it would be great to have portable whisper binaries that just work.

benwilcock · 2023-12-01T16:30:26Z

I agree, Whisper is awesome. I used to use it at the command line (which was very slow). I now use this project, and the inference times are 5-10x faster: https://github.com/jhj0517/Whisper-WebUI

asmith26 · 2023-12-08T12:19:37Z

I would also love speech input support for this, not only because it would be a really cool feature, but also because I sometimes get a bit of RSI so anything to help reduce the amount of typing needed is very helpful.

Kreijstal · 2023-12-09T15:40:00Z

yeah this would be nice to have in the llama server, llamafile was the only way I have figured out to run this things I've been hearing about for a year!

ingenieroariel · 2023-12-09T21:40:41Z

Speech input is a big feature in my use case, I do it now with GPT-4 on iPhone but doing the same with llamafile's server would be fantastic. What are the main blockers?

smrl · 2023-12-13T20:46:55Z

Please, very interested in this use-case!

flatsiedatsie · 2024-01-24T10:49:29Z

Devil's advocate: it's not very difficult to run whisper separately and pipe any recognised sentences into Llamafile? I'm literally doing that right now, for example. It's also relatively easy to do in the browser.

What would be the benefit? Would this integration allow the LLM to start processing detected words earlier?

jart · 2024-01-24T19:23:17Z

It would have the same benefit that llamafile does. You wouldn't have to compile the software yourself.

AmgadHasan · 2024-02-19T19:35:02Z

Hi.

Is there any update regarding this request?

jart · 2024-02-19T20:46:04Z

I was able to build whisper.cpp using cosmocc with very few modifications.

diff --git a/Makefile b/Makefile
index 93c89cd..b3a89d7 100644
--- a/Makefile
+++ b/Makefile
@@ -39,7 +39,7 @@ endif
 #

 CFLAGS   = -I.              -O3 -DNDEBUG -std=c11   -fPIC
-CXXFLAGS = -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC
+CXXFLAGS = -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -fexceptions
 LDFLAGS  =

 ifdef MACOSX_DEPLOYMENT_TARGET
@@ -134,38 +134,38 @@ ifeq ($(UNAME_M),$(filter $(UNAME_M),x86_64 i686 amd64))
        ifdef CPUINFO_CMD
                AVX_M := $(shell $(CPUINFO_CMD) | grep -iwE 'AVX|AVX1.0')
                ifneq (,$(AVX_M))
-                       CFLAGS   += -mavx
-                       CXXFLAGS += -mavx
+                       CFLAGS   += -Xx86_64-mavx
+                       CXXFLAGS += -Xx86_64-mavx
                endif

                AVX2_M := $(shell $(CPUINFO_CMD) | grep -iw 'AVX2')
                ifneq (,$(AVX2_M))
-                       CFLAGS   += -mavx2
-                       CXXFLAGS += -mavx2
+                       CFLAGS   += -Xx86_64-mavx2
+                       CXXFLAGS += -Xx86_64-mavx2
                endif

                FMA_M := $(shell $(CPUINFO_CMD) | grep -iw 'FMA')
                ifneq (,$(FMA_M))
-                       CFLAGS   += -mfma
-                       CXXFLAGS += -mfma
+                       CFLAGS   += -Xx86_64-mfma
+                       CXXFLAGS += -Xx86_64-mfma
                endif

                F16C_M := $(shell $(CPUINFO_CMD) | grep -iw 'F16C')
                ifneq (,$(F16C_M))
-                       CFLAGS   += -mf16c
-                       CXXFLAGS += -mf16c
+                       CFLAGS   += -Xx86_64-mf16c
+                       CXXFLAGS += -Xx86_64-mf16c
                endif

                SSE3_M := $(shell $(CPUINFO_CMD) | grep -iwE 'PNI|SSE3')
                ifneq (,$(SSE3_M))
-                       CFLAGS   += -msse3
-                       CXXFLAGS += -msse3
+                       CFLAGS   += -Xx86_64-msse3
+                       CXXFLAGS += -Xx86_64-msse3
                endif

                SSSE3_M := $(shell $(CPUINFO_CMD) | grep -iw 'SSSE3')
                ifneq (,$(SSSE3_M))
-                       CFLAGS   += -mssse3
-                       CXXFLAGS += -mssse3
+                       CFLAGS   += -Xx86_64-mssse3
+                       CXXFLAGS += -Xx86_64-mssse3
                endif
        endif
 endif
diff --git a/ggml.c b/ggml.c
index 4ee2c5e..521eafe 100644
--- a/ggml.c
+++ b/ggml.c
@@ -24,7 +24,7 @@
 #include <stdarg.h>
 #include <signal.h>
 #if defined(__gnu_linux__)
-#include <syscall.h>
+#include <sys/syscall.h>
 #endif

 #ifdef GGML_USE_METAL
@@ -2069,6 +2069,8 @@ void ggml_numa_init(enum ggml_numa_strategy numa_flag) {
     int getcpu_ret = 0;
 #if __GLIBC__ > 2 || (__GLIBC__ == 2 && __GLIBC_MINOR__ > 28)
     getcpu_ret = getcpu(&current_cpu, &g_state.numa.current_node);
+#elif defined(__COSMOPOLITAN__)
+    current_cpu = sched_getcpu(), getcpu_ret = 0;
 #else
     // old glibc doesn't have a wrapper for this call. Fall back on direct syscall
     getcpu_ret = syscall(SYS_getcpu,&current_cpu,&g_state.numa.current_node);

I made a couple changes to cosmopolitan upstream that'll be incorporated in the next release for making it easier to build. More work would need to be done to do it as well as llamafile packages llama.cpp. But until then, you have this:

whisperfile.gz

versae · 2024-02-20T10:54:30Z

Wow, thanks @jart! That's amazing!
Just confirming that it works like a charm :D

$ whisperfile -m ggml-model-q5_0.bin samples/jfk.wav 
whisper_init_from_file_with_params_no_state: loading model from 'ggml-model-q5_0.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51866
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1280
whisper_model_load: n_audio_head  = 20
whisper_model_load: n_audio_layer = 32
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1280
whisper_model_load: n_text_head   = 20
whisper_model_load: n_text_layer  = 32
whisper_model_load: n_mels        = 128
whisper_model_load: ftype         = 8
whisper_model_load: qntvr         = 2
whisper_model_load: type          = 5 (large v3)
whisper_model_load: adding 1609 extra tokens
whisper_model_load: n_langs       = 100
whisper_model_load:      CPU total size =  1080.47 MB
whisper_model_load: model size    = 1080.47 MB
whisper_init_state: kv self size  =  220.20 MB
whisper_init_state: kv cross size =  245.76 MB
whisper_init_state: compute buffer (conv)   =   36.26 MB
whisper_init_state: compute buffer (encode) =  926.66 MB
whisper_init_state: compute buffer (cross)  =    9.38 MB
whisper_init_state: compute buffer (decode) =  209.26 MB

system_info: n_threads = 4 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 0 | COREML = 0 | OPENVINO = 0 | 

main: processing 'samples/jfk.wav' (176000 samples, 11.0 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.060 --> 00:00:07.500]   And so, my dear Americans, do not ask what your country can do for you.
[00:00:07.500 --> 00:00:11.000]   Ask what you can do for your country.


whisper_print_timings:     load time =  1281.10 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =    41.95 ms
whisper_print_timings:   sample time =   102.85 ms /   159 runs (    0.65 ms per run)
whisper_print_timings:   encode time = 29479.98 ms /     1 runs (29479.98 ms per run)
whisper_print_timings:   decode time =    38.76 ms /     1 runs (   38.76 ms per run)
whisper_print_timings:   batchd time =  3710.61 ms /   156 runs (   23.79 ms per run)
whisper_print_timings:   prompt time =     0.00 ms /     1 runs (    0.00 ms per run)
whisper_print_timings:    total time = 34662.24 ms

Any instructions on how to package it together with a GGML model? Thanks again!

versae · 2024-02-20T11:22:53Z

I just tried to compile it myself, to see it I was able to also get the stream binary to work. But after I apply the patch, the make command errors out with exponent has no digits:

whisper.cpp:2575:27: error: exponent has no digits
 2575 |         double theta = (2*M_PI*i)/SIN_COS_N_COUNT;
      |                           ^~~~
whisper.cpp:2672:42: error: exponent has no digits
 2672 |         output[i] = 0.5*(1.0 - cosf((2.0*M_PI*i)/(length + offset)));
      |                                          ^~~~

I run it like this:

$ make CC=bin/cosmocc CXX=bin/cosmoc++ stream

jart · 2024-02-20T13:00:59Z

@versae in your cosmocc toolchain just change include/libc/math.h to use the non-hex constants instead:

#define M_E        2.7182818284590452354  /* 𝑒 */
#define M_LOG2E    1.4426950408889634074  /* log₂𝑒 */
#define M_LOG10E   0.43429448190325182765 /* log₁₀𝑒 */
#define M_LN2      0.69314718055994530942 /* logₑ2 */
#define M_LN10     2.30258509299404568402 /* logₑ10 */
#define M_PI       3.14159265358979323846 /* pi */
#define M_PI_2     1.57079632679489661923 /* pi/2 */
#define M_PI_4     0.78539816339744830962 /* pi/4 */
#define M_1_PI     0.31830988618379067154 /* 1/pi */
#define M_2_PI     0.63661977236758134308 /* 2/pi */
#define M_2_SQRTPI 1.12837916709551257390 /* 2/sqrt(pi) */
#define M_SQRT2    1.41421356237309504880 /* sqrt(2) */
#define M_SQRT1_2  0.70710678118654752440 /* 1/sqrt(2) */

This will ship in the next cosmocc release.

versae · 2024-02-20T18:45:25Z

After some tweaking, I was able to compile my own cosmocc and then use it to compile main, quantize and even server 🎉 . However, for stream there seems to be some issue with SDL2 library.

/usr/include/SDL2/SDL_config.h:4:10: fatal error: SDL2/_real_SDL_config.h: No such file or directory
    4 | #include <SDL2/_real_SDL_config.h>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make: *** [Makefile:402: stream] Error 1

I'll keep investigating as this could easily be some rookie mistake on my side.

Would it make sense to create a whisperllama repo for this?

AmgadHasan · 2024-02-20T23:28:21Z

After some tweaking, I was able to compile my own cosmocc and then use it to compile main, quantize and even server 🎉 . However, for stream there seems to be some issue with SDL2 library.
/usr/include/SDL2/SDL_config.h:4:10: fatal error: SDL2/_real_SDL_config.h: No such file or directory
    4 | #include <SDL2/_real_SDL_config.h>
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make: *** [Makefile:402: stream] Error 1
I'll keep investigating as this could easily be some rookie mistake on my side.

Would it make sense to create a whisperllama repo for this?

I would suggest calling it Whisperfile :)

develperbayman · 2024-04-01T23:28:04Z

so i have been using a more generic method to get voice in/out it works flawless my issue has been getting the model to load at a decent speed i have many variation of this code idk if this one is broken or not but i know the voice in/out works flawlessly edit: also its in python so not sure if it helps but even if it does not sometimes simplicity is best text to speech can go a long way

import speech_recognition as sr
from llama_cpp import Llama
import pyttsx3
from pydub import AudioSegment
import simpleaudio
from transformers import AutoModel
import os


# Load GGUF model efficiently using llama-cpp
model = AutoModel.from_pretrained("sonu2023/Mistral-7B-Vatax-v1-q8_0-GUFF")

recognizer = sr.Recognizer()
chatbot_busy = False

engine = pyttsx3.init()


def play_activation_sound():
    # Replace './computer.wav' with the path to your activation sound in WAV format
    activation_sound = AudioSegment.from_file('./computer.wav')
    simpleaudio.play_buffer(activation_sound.raw_data, num_channels=activation_sound.channels, bytes_per_sample=activation_sound.sample_width, sample_rate=activation_sound.frame_rate)


def chatbot_response(user_input):
    global chatbot_busy
    response = ""

    if user_input and not chatbot_busy:
        print("User:", user_input)

        # Generate response using llama-cpp
        prompt = f"[USER]: {user_input}\n[BOT]: "  # Use a more explicit prompt format
        response = llm.create_chat_completion(prompt=prompt)["messages"][-1]["content"]

        print("Chatbot:", response)
        chatbot_busy = False

        # Text-to-speech with pyttsx3
        text_to_speech(response)


def text_to_speech(text):
    # Save the synthesized speech to a temporary WAV file
    engine.save_to_file(text, 'output.wav')
    engine.runAndWait()

    # Play the temporary WAV file
    synthesized_sound = AudioSegment.from_file('output.wav')
    simpleaudio.play_buffer(synthesized_sound.raw_data, num_channels=synthesized_sound.channels, bytes_per_sample=synthesized_sound.sample_width, sample_rate=synthesized_sound.frame_rate)

    # Remove the temporary WAV file
    os.remove('output.wav')


def listen_for_input():
    global chatbot_busy

    with sr.Microphone() as source:
        recognizer.adjust_for_ambient_noise(source)

        while True:
            try:
                print("Listening...")
                audio_data = recognizer.listen(source)
                user_input = recognizer.recognize_google(audio_data).lower()
                print("User:", user_input)

                if 'computer' in user_input:
                    print("Chatbot activated. Speak now.")
                    play_activation_sound()

                    audio_data = recognizer.listen(source)
                    print("Listening...")
                    user_input = recognizer.recognize_google(audio_data).lower()

                    # Generate and respond using llama-cpp
                    chatbot_response(user_input)

            except sr.UnknownValueError:
                print("Could not understand audio. Please try again.")
            except Exception as e:
                print(f"An error occurred: {e}")


# Start listening for input
input_thread = threading.Thread(target=listen_for_input)
input_thread.start()

cjpais · 2024-05-16T19:36:15Z

@jart I am in progress on getting a version of whisper.cpp built with llamafile, specifically the server example

The executable itself is working and seems to be compiling properly for CUDA. However I would love some help with the file loading from within the zipaligned archive. If you could provide some guidance on what needs to be done in order to implement this portion that would be great.

I have replaced the std::ifstream opening with llamafile_open_gguf, however I am running into errors with this. I recognize maybe this function needs modification in order to load the whisper models which are not .gguf directly. Currently I get the warning warning: not a pkzip archive and it seems like it is trying to load the file from the local directory as opposed to from the zipaligned version. Not sure if I need to manipulate the filepath in some way or if this is handled with some utility function.

I am currently using the files llama.cpp and server.cpp as reference for what I should be doing, but would love any help if you know the implementation off the top of your head.

jart · 2024-05-16T20:06:56Z

If you've already discovered llamafile/llamafile.c then I'm not sure what other high level guidance I can offer you.

cjpais · 2024-05-18T19:25:53Z

Thanks @jart, that was all I needed. My C skills are a bit rusty so it was great to know I wasn't missing anything obvious, instead I was just forgetting some C basics

For the time being I've forked the llamafile into: https://github.com/cjpais/whisperfile

If it makes sense to integrate directly into llamafile, I am happy to clean up the code and submit a PR. If this is the case just let me know how you would like the dirs to be structured

jgbrwn · 2024-05-27T17:39:00Z

Why not just try to integrate/cosmopolitan-ize talk-llama into llamafile? Didn’t he already do all the heavy-lifting around this perhaps?

cjpais · 2024-05-27T19:53:47Z

it can also be done, probably fairly easy to do in the whisperfile repo. I needed server for a project I am doing so that was my primary focus. If there is enough interest I can port over talk-llama, happy to accept PR's as well

Tamnac · 2024-09-06T23:03:40Z

I believe this issue can be closed now that we have whisperfile

jart · 2024-09-06T23:15:33Z

The whisperfile project @cjpais posted earlier is now been made an official Mozilla project. See the whisper.cpp/ folder of the llamafile codebase. Releases have been published to https://huggingface.co/Mozilla/whisperfile so enjoy everyone!

jart assigned stlhood Nov 30, 2023

jart added the question label Nov 30, 2023

jart added request to lend support and removed question labels Dec 2, 2023

asmith26 mentioned this issue Dec 6, 2023

run-detectors: unable to find an interpreter for ./llava-v1.5-7b-q4-server.llamafile #47

Closed

jart mentioned this issue Dec 9, 2023

Is there something like this but with whisper? #78

Closed

jart mentioned this issue Jan 11, 2024

ok kinda a big ask ......but #187

Closed

jart mentioned this issue Feb 19, 2024

WhisperFile #260

Closed

amakropoulos mentioned this issue Mar 4, 2024

Integrate text-to-speech and speech-to-text functionality undreamai/LLMUnity#44

Open

cjpais mentioned this issue Jul 30, 2024

Add whisper.cpp (server) support to llamafile #517

Merged

jart closed this as completed Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for whisper.cpp #17

Support for whisper.cpp #17

versae commented Nov 30, 2023

benwilcock commented Dec 1, 2023

asmith26 commented Dec 8, 2023

Kreijstal commented Dec 9, 2023

ingenieroariel commented Dec 9, 2023

smrl commented Dec 13, 2023

flatsiedatsie commented Jan 24, 2024

jart commented Jan 24, 2024

AmgadHasan commented Feb 19, 2024

jart commented Feb 19, 2024 •

edited

Loading

versae commented Feb 20, 2024

versae commented Feb 20, 2024 •

edited

Loading

jart commented Feb 20, 2024

versae commented Feb 20, 2024

AmgadHasan commented Feb 20, 2024

develperbayman commented Apr 1, 2024 •

edited

Loading

cjpais commented May 16, 2024

jart commented May 16, 2024

cjpais commented May 18, 2024 •

edited

Loading

jgbrwn commented May 27, 2024 •

edited

Loading

cjpais commented May 27, 2024

Tamnac commented Sep 6, 2024

jart commented Sep 6, 2024

Support for whisper.cpp #17

Support for whisper.cpp #17

Comments

versae commented Nov 30, 2023

benwilcock commented Dec 1, 2023

asmith26 commented Dec 8, 2023

Kreijstal commented Dec 9, 2023

ingenieroariel commented Dec 9, 2023

smrl commented Dec 13, 2023

flatsiedatsie commented Jan 24, 2024

jart commented Jan 24, 2024

AmgadHasan commented Feb 19, 2024

jart commented Feb 19, 2024 • edited Loading

versae commented Feb 20, 2024

versae commented Feb 20, 2024 • edited Loading

jart commented Feb 20, 2024

versae commented Feb 20, 2024

AmgadHasan commented Feb 20, 2024

develperbayman commented Apr 1, 2024 • edited Loading

cjpais commented May 16, 2024

jart commented May 16, 2024

cjpais commented May 18, 2024 • edited Loading

jgbrwn commented May 27, 2024 • edited Loading

cjpais commented May 27, 2024

Tamnac commented Sep 6, 2024

jart commented Sep 6, 2024

jart commented Feb 19, 2024 •

edited

Loading

versae commented Feb 20, 2024 •

edited

Loading

develperbayman commented Apr 1, 2024 •

edited

Loading

cjpais commented May 18, 2024 •

edited

Loading

jgbrwn commented May 27, 2024 •

edited

Loading