Integrate CUDA support #27

jwijffels · 2024-01-27T19:00:43Z

TODO

Changes in the Makevars such that it also compiles these (the source files are all under src/whisper_cpp but I disabled the compilation as I don't have a computer with GPU currently)
Pass on use_gpu in the constructor: https://github.com/bnosac/audio.whisper/blob/master/src/rcpp_whisper.cpp#L230
Use that in whisper_load_model at https://github.com/bnosac/audio.whisper/blob/master/src/rcpp_whisper.cpp#L239C6-L239C24 and next pass it on from R

Next integrate https://github.com/ggerganov/whisper.cpp/blob/master/Makefile#L210-L222 in Makevars

jwijffels · 2024-01-27T21:28:44Z

We are now at the state where the following should at least work once we integrate https://github.com/ggerganov/whisper.cpp/blob/master/Makefile#L210-L222 in Makevars and compile the cuda parts

library(audio.whisper)
path  <- system.file(package = "audio.whisper", "repo", "ggml-tiny.en-q5_1.bin")
model <- whisper(path, use_gpu = TRUE)

ifdef WHISPER_CUBLAS
	ifeq ($(shell expr $(NVCC_VERSION) \>= 11.6), 1)
		CUDA_ARCH_FLAG=native
	else
		CUDA_ARCH_FLAG=all
	endif

	CFLAGS      += -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I$(CUDA_PATH)/targets/$(UNAME_M)-linux/include
	CXXFLAGS    += -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I$(CUDA_PATH)/targets/$(UNAME_M)-linux/include
	LDFLAGS     += -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L$(CUDA_PATH)/targets/$(UNAME_M)-linux/lib
	WHISPER_OBJ += ggml-cuda.o
	NVCC        = nvcc
	NVCCFLAGS   = --forward-unknown-to-host-compiler -arch=$(CUDA_ARCH_FLAG)

ggml-cuda.o: ggml-cuda.cu ggml-cuda.h
	$(NVCC) $(NVCCFLAGS) $(CXXFLAGS) -Wno-pedantic -c $< -o $@
endif

jmgirard · 2024-02-13T14:43:57Z

Just a heads up that I have access to a Windows machine with CUDA in case it would be helpful for me to do tests and benchmarks like I did for MacOS.

jwijffels · 2024-02-18T07:21:13Z

That would indeed be great.
Could you show the compilation trace when installing the package already?
Is CUDA_PATH somehow set on that machine?
I see in the makevars that it links to /usr/local/cuda/include,
/opt/cuda/include, /opt/cuda/lib64 do these paths uberhaupt exist on your machine or where are these located?

jmgirard · 2024-02-18T15:19:21Z

OS Name: Microsoft Windows 10 Enterprise
Version: 10.0.19044 Build 19044
Processor: Intel(R) Xeon(R) CPU E5-1620 v2 @ 3.70GHz, 3701 Mhz, 4 Core(s), 8 Logical Processor(s)
GPU: NVIDIA GeForce RTX 3050
GPU Driver: 31.0.15.4584

Trace

> remotes::install_github("bnosac/audio.whisper", force = TRUE)
The downloaded binary packages are in
	C:\Users\j553g371\AppData\Local\Temp\RtmpIhem3s\downloaded_packages
Running `R CMD build`...
* checking for file 'C:\Users\j553g371\AppData\Local\Temp\RtmpIhem3s\remotes3074262171b7\bnosac-audio.whisper-ecdb06d/DESCRIPTION' ... OK
* preparing 'audio.whisper':
* checking DESCRIPTION meta-information ... OK
* cleaning src
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building 'audio.whisper_0.3.2.tar.gz'
* installing *source* package 'audio.whisper' ...
** using staged installation
** libs
using C++ compiler: 'G__~1.EXE (GCC) 12.3.0'
using C++11
I whisper.cpp build info: 
I UNAME_S:  MSYS_NT-10.0-19044
I UNAME_P:  unknown
I UNAME_M:  x86_64
I PKG_CFLAGS:   -mavx -mf16c -msse3 -mssse3 -D_XOPEN_SOURCE=600
I PKG_CPPFLAGS: -mavx -mf16c -msse3 -mssse3 -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600
I PKG_LIBS:  

gcc  -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -mavx -mf16c -msse3 -mssse3 -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include'   -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include"  -mavx -mf16c -msse3 -mssse3 -D_XOPEN_SOURCE=600   -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c whisper_cpp/ggml-quants.c -o whisper_cpp/ggml-quants.o
whisper_cpp/ggml-quants.c:1337:14: warning: 'make_qkx1_quants' defined but not used [-Wunused-function]
 1337 | static float make_qkx1_quants(int n, int nmax, const float * restrict x, uint8_t * restrict L, float * restrict the_min,
      |              ^~~~~~~~~~~~~~~~
gcc  -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -mavx -mf16c -msse3 -mssse3 -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include'   -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include"  -mavx -mf16c -msse3 -mssse3 -D_XOPEN_SOURCE=600   -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c whisper_cpp/ggml-backend.c -o whisper_cpp/ggml-backend.o
whisper_cpp/ggml-backend.c:841:13: warning: 'sched_print_assignments' defined but not used [-Wunused-function]
  841 | static void sched_print_assignments(ggml_backend_sched_t sched, struct ggml_cgraph * graph) {
      |             ^~~~~~~~~~~~~~~~~~~~~~~
gcc  -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -mavx -mf16c -msse3 -mssse3 -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include'   -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include"  -mavx -mf16c -msse3 -mssse3 -D_XOPEN_SOURCE=600   -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c whisper_cpp/ggml-alloc.c -o whisper_cpp/ggml-alloc.o
gcc  -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -mavx -mf16c -msse3 -mssse3 -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include'   -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include"  -mavx -mf16c -msse3 -mssse3 -D_XOPEN_SOURCE=600   -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c whisper_cpp/ggml.c -o whisper_cpp/ggml.o
whisper_cpp/ggml.c:17593:13: warning: 'ggml_opt_get_grad' defined but not used [-Wunused-function]
17593 | static void ggml_opt_get_grad(int np, struct ggml_tensor * const ps[], float * g) {
      |             ^~~~~~~~~~~~~~~~~
g++  -std=gnu++11 -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -mavx -mf16c -msse3 -mssse3 -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include'   -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include"     -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c whisper_cpp/whisper.cpp -o whisper_cpp/whisper.o
whisper_cpp/whisper.cpp:203:29: warning: 'ggml_tensor* ggml_mul_mat_pad(ggml_context*, ggml_tensor*, ggml_tensor*, int)' defined but not used [-Wunused-function]
  203 | static struct ggml_tensor * ggml_mul_mat_pad(struct ggml_context * ctx, struct ggml_tensor * x, struct ggml_tensor * y, int pad = 32) {
      |                             ^~~~~~~~~~~~~~~~
g++  -std=gnu++11 -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -mavx -mf16c -msse3 -mssse3 -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include'   -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include"     -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c whisper_cpp/common-ggml.cpp -o whisper_cpp/common-ggml.o
g++  -std=gnu++11 -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -mavx -mf16c -msse3 -mssse3 -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include'   -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include"     -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c whisper_cpp/common.cpp -o whisper_cpp/common.o
g++  -std=gnu++11 -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -mavx -mf16c -msse3 -mssse3 -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include'   -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include"     -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c rcpp_whisper.cpp -o rcpp_whisper.o
g++  -std=gnu++11 -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -mavx -mf16c -msse3 -mssse3 -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include'   -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include"     -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c RcppExports.cpp -o RcppExports.o
g++ -shared -s -static-libgcc -o audio.whisper.dll tmp.def whisper_cpp/ggml-quants.o whisper_cpp/ggml-backend.o whisper_cpp/ggml-alloc.o whisper_cpp/ggml.o whisper_cpp/whisper.o whisper_cpp/common-ggml.o whisper_cpp/common.o rcpp_whisper.o RcppExports.o -LC:/rtools43/x86_64-w64-mingw32.static.posix/lib/x64 -LC:/rtools43/x86_64-w64-mingw32.static.posix/lib -LC:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/bin/x64 -lR
installing to C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/00LOCK-audio.whisper/00new/audio.whisper/libs/x64
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (audio.whisper)

Use GPU

> library(audio.whisper)
> model <- whisper("medium", use_gpu = TRUE)
trying URL 'https://huggingface.co/ggerganov/whisper.cpp/resolve/d15393806e24a74f60827e23e986f0c10750b358/ggml-medium.bin'
Content type 'application/octet-stream' length 1533763059 bytes (1462.7 MB)
downloaded 1462.7 MB

Downloading finished, model stored at 'C:/Users/j553g371/Documents/ggml-medium.bin'
whisper_init_from_file_with_params_no_state: loading model from 'C:/Users/j553g371/Documents/ggml-medium.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head  = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1024
whisper_model_load: n_text_head   = 16
whisper_model_load: n_text_layer  = 24
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 4 (medium)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:      CPU buffer size =  1533.52 MB
whisper_model_load: model size    = 1533.14 MB
whisper_init_state: kv self size  =  132.12 MB
whisper_init_state: kv cross size =  147.46 MB
whisper_init_state: compute buffer (conv)   =   25.61 MB
whisper_init_state: compute buffer (encode) =  170.28 MB
whisper_init_state: compute buffer (cross)  =    7.85 MB
whisper_init_state: compute buffer (decode) =   98.32 MB

jwijffels · 2024-02-18T15:49:15Z

What does which nvcc and nvcc --version give on that machine when executing from a shell?

jmgirard · 2024-02-18T16:02:18Z

> which nvcc
/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.2/bin/nvcc

>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:09:35_Pacific_Daylight_Time_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0

jwijffels · 2024-02-18T16:22:40Z

Can you show all files which are (recursively) at /c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.2. Something like list.files("/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.2", recursive=TRUE, full.names = TRUE)

Does pkg-config --cflags cuda run from a shell provide something? Or something similar then ´cuda´ like nvidia-cuda-toolkit

jmgirard · 2024-02-18T16:47:41Z

See attached for full list of files
files.csv

> pkg-config --cflags cuda
Package cuda was not found in the pkg-config search path.
Perhaps you should add the directory containing `cuda.pc'
to the PKG_CONFIG_PATH environment variable
Package 'cuda', required by 'virtual:world', not found

jmgirard · 2024-02-18T16:52:19Z

Note as well that I have system variables for CUDA_PATH and CUDA_PATH_V2_2, both set to "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2". This same path's "bin" and "libnvvp" subdirectories are also added to the PATH system variable.

jwijffels · 2024-02-18T17:08:19Z

Thanks. With that information, I think I can add something to the Makevars to compile the cuda source code.
So does pkg-config --list-all provide something like cuda or nvidia?
pkg-config --list-all | grep cuda
pkg-config --list-all | grep nvidia
pkg-config --libs cuda-12.2

jmgirard · 2024-02-18T18:53:26Z

There's nothing there with cuda, nvidia, or nv.

jwijffels · 2024-02-19T21:30:51Z

I've set up continuous integration with CUDA in the cuda branch: https://github.com/bnosac/audio.whisper/tree/cuda
And after some trials and errors, managed to be able to compile the code against the cuda libraries.
You should be able to install it by setting environment variable WHISPER_CUBLAS to 1 before doing the installation as shown below. That seems to work at least on Linux when looking at the CI run https://github.com/bnosac/audio.whisper/actions/runs/7964875388/job/21743294472

# Install with CUDA
Sys.setenv(WHISPER_CUBLAS = "1")
remotes::install_github("bnosac/audio.whisper@cuda", force = TRUE)

# Get audio file to transcribe
library(av)
download.file(url = "https://www.ubu.com/media/sound/dec_francis/Dec-Francis-E_rant1.mp3", destfile = "rant1.mp3", mode = "wb")
av_audio_convert("rant1.mp3", output = "output.wav", format = "wav", sample_rate = 16000)

# See how long it takes - with use_gpu = TRUE
library(audio.whisper)
model <- whisper("medium", use_gpu = TRUE)
trans <- predict(model, newdata = "output.wav", language = "en", n_threads = 1)
trans$timing
trans$data

On Windows it looks like the nvcc executable from CUDA needs cl.exe which is part of Microsoft Visual Studio and needs to be in the PATH CI run shows: (nvcc fatal : Cannot find compiler 'cl.exe' in PATH). I hope that by adding the visual_studio_integration in the continuous integration, the tests on Windows also turn green in but that is just continuous integration tests. Probably it will install already on your Windows machine.

I've now let it create a fat binary for all the gpu architectures which nvcc lists: nvcc --list-gpu-arch.

jmgirard · 2024-02-19T22:21:41Z

Error when trying to install without visual studio

Trace

Running `R CMD build`...
* checking for file 'C:\Users\j553g371\AppData\Local\Temp\RtmpiuN0P2\remotes5aec71827f9f\bnosac-audio.whisper-82e1bcc/DESCRIPTION' ... OK
* preparing 'audio.whisper':
* checking DESCRIPTION meta-information ... OK
* cleaning src
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building 'audio.whisper_0.3.2.tar.gz'
* installing *source* package 'audio.whisper' ...
** using staged installation
** libs
using C++ compiler: 'G__~1.EXE (GCC) 12.3.0'
using C++11
I whisper.cpp build info: 
I UNAME_S:  MSYS_NT-10.0-19044
I UNAME_P:  unknown
I UNAME_M:  x86_64
I PKG_CFLAGS:   -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600
I PKG_CPPFLAGS: -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600
I PKG_LIBS:  -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -LC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/lib64 -L/opt/cuda/lib64 -LC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/lib

gcc  -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include'   -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include"  -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600   -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c whisper_cpp/ggml-quants.c -o whisper_cpp/ggml-quants.o
whisper_cpp/ggml-quants.c:1337:14: warning: 'make_qkx1_quants' defined but not used [-Wunused-function]
 1337 | static float make_qkx1_quants(int n, int nmax, const float * restrict x, uint8_t * restrict L, float * restrict the_min,
      |              ^~~~~~~~~~~~~~~~
gcc  -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include'   -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include"  -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600   -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c whisper_cpp/ggml-backend.c -o whisper_cpp/ggml-backend.o
whisper_cpp/ggml-backend.c:841:13: warning: 'sched_print_assignments' defined but not used [-Wunused-function]
  841 | static void sched_print_assignments(ggml_backend_sched_t sched, struct ggml_cgraph * graph) {
      |             ^~~~~~~~~~~~~~~~~~~~~~~
gcc  -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include'   -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include"  -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600   -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c whisper_cpp/ggml-alloc.c -o whisper_cpp/ggml-alloc.o
gcc  -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include'   -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include"  -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600   -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c whisper_cpp/ggml.c -o whisper_cpp/ggml.o
whisper_cpp/ggml.c:17593:13: warning: 'ggml_opt_get_grad' defined but not used [-Wunused-function]
17593 | static void ggml_opt_get_grad(int np, struct ggml_tensor * const ps[], float * g) {
      |             ^~~~~~~~~~~~~~~~~
g++  -std=gnu++11 -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include'   -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include"     -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c whisper_cpp/whisper.cpp -o whisper_cpp/whisper.o
whisper_cpp/whisper.cpp:203:29: warning: 'ggml_tensor* ggml_mul_mat_pad(ggml_context*, ggml_tensor*, ggml_tensor*, int)' defined but not used [-Wunused-function]
  203 | static struct ggml_tensor * ggml_mul_mat_pad(struct ggml_context * ctx, struct ggml_tensor * x, struct ggml_tensor * y, int pad = 32) {
      |                             ^~~~~~~~~~~~~~~~
g++  -std=gnu++11 -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include'   -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include"     -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c whisper_cpp/common-ggml.cpp -o whisper_cpp/common-ggml.o
g++  -std=gnu++11 -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include'   -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include"     -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c whisper_cpp/common.cpp -o whisper_cpp/common.o
g++  -std=gnu++11 -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include'   -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include"     -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c rcpp_whisper.cpp -o rcpp_whisper.o
g++  -std=gnu++11 -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include'   -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include"     -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c RcppExports.cpp -o RcppExports.o
nvcc --forward-unknown-to-host-compiler -arch=all -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -fPIC -IC:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include -c whisper_cpp/ggml-cuda.cu -o whisper_cpp/ggml-cuda.o
nvcc fatal   : Cannot find compiler 'cl.exe' in PATH
make: *** [Makevars:298: whisper_cpp/ggml-cuda.o] Error 1
ERROR: compilation failed for package 'audio.whisper'
* removing 'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/audio.whisper'
* restoring previous 'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/audio.whisper'
Warning message:
In i.p(...) :
  installation of package ‘C:/Users/j553g371/AppData/Local/Temp/RtmpiuN0P2/file5aec3ab54f4/audio.whisper_0.3.2.tar.gz’ had non-zero exit status

Trying to install MSVC now...

jwijffels · 2024-02-19T22:27:33Z

Yes, on Windows you need cl.exe and the location of that file should be in the PATH.

jmgirard · 2024-02-19T22:44:28Z

Install Visual Studio Community
Install "Desktop Development with C++" module inside VSC
Find cl.exe (for mine it was in C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\bin\Hostx64\x64)
Add that directory path to system environmental variables: PATH
Try to install audio.whisper from cuda branch

Trace

> Sys.setenv(WHISPER_CUBLAS = "1")
> remotes::install_github("bnosac/audio.whisper@cuda", force = TRUE)
Downloading GitHub repo bnosac/audio.whisper@cuda
Running `R CMD build`...
* checking for file 'C:\Users\j553g371\AppData\Local\Temp\RtmpMpvn9d\remotes14a46b4cd3a\bnosac-audio.whisper-90023be/DESCRIPTION' ... OK
* preparing 'audio.whisper':
* checking DESCRIPTION meta-information ... OK
* cleaning src
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building 'audio.whisper_0.3.2.tar.gz'
* installing *source* package 'audio.whisper' ...
** using staged installation
** libs
using C++ compiler: 'G__~1.EXE (GCC) 12.3.0'
using C++11
I whisper.cpp build info: 
I UNAME_S:  MSYS_NT-10.0-19044
I UNAME_P:  unknown
I UNAME_M:  x86_64
I PKG_CFLAGS:   -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600
I PKG_CPPFLAGS: -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600
I PKG_LIBS:  -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -LC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/lib64 -L/opt/cuda/lib64 -LC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/lib

gcc  -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include'   -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include"  -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600   -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c whisper_cpp/ggml-quants.c -o whisper_cpp/ggml-quants.o
whisper_cpp/ggml-quants.c:1337:14: warning: 'make_qkx1_quants' defined but not used [-Wunused-function]
 1337 | static float make_qkx1_quants(int n, int nmax, const float * restrict x, uint8_t * restrict L, float * restrict the_min,
      |              ^~~~~~~~~~~~~~~~
gcc  -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include'   -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include"  -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600   -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c whisper_cpp/ggml-backend.c -o whisper_cpp/ggml-backend.o
whisper_cpp/ggml-backend.c:841:13: warning: 'sched_print_assignments' defined but not used [-Wunused-function]
  841 | static void sched_print_assignments(ggml_backend_sched_t sched, struct ggml_cgraph * graph) {
      |             ^~~~~~~~~~~~~~~~~~~~~~~
gcc  -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include'   -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include"  -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600   -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c whisper_cpp/ggml-alloc.c -o whisper_cpp/ggml-alloc.o
gcc  -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include'   -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include"  -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600   -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c whisper_cpp/ggml.c -o whisper_cpp/ggml.o
whisper_cpp/ggml.c:17593:13: warning: 'ggml_opt_get_grad' defined but not used [-Wunused-function]
17593 | static void ggml_opt_get_grad(int np, struct ggml_tensor * const ps[], float * g) {
      |             ^~~~~~~~~~~~~~~~~
g++  -std=gnu++11 -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include'   -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include"     -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c whisper_cpp/whisper.cpp -o whisper_cpp/whisper.o
whisper_cpp/whisper.cpp:203:29: warning: 'ggml_tensor* ggml_mul_mat_pad(ggml_context*, ggml_tensor*, ggml_tensor*, int)' defined but not used [-Wunused-function]
  203 | static struct ggml_tensor * ggml_mul_mat_pad(struct ggml_context * ctx, struct ggml_tensor * x, struct ggml_tensor * y, int pad = 32) {
      |                             ^~~~~~~~~~~~~~~~
g++  -std=gnu++11 -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include'   -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include"     -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c whisper_cpp/common-ggml.cpp -o whisper_cpp/common-ggml.o
g++  -std=gnu++11 -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include'   -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include"     -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c whisper_cpp/common.cpp -o whisper_cpp/common.o
g++  -std=gnu++11 -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include'   -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include"     -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c rcpp_whisper.cpp -o rcpp_whisper.o
g++  -std=gnu++11 -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include'   -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include"     -O2 -Wall  -mfpmath=sse -msse2 -mstackrealign  -c RcppExports.cpp -o RcppExports.o
nvcc --forward-unknown-to-host-compiler -arch=all -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -fPIC -IC:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include -c whisper_cpp/ggml-cuda.cu -o whisper_cpp/ggml-cuda.o
cl : Command line warning D9002 : ignoring unknown option '-mavx'
cl : Command line warning D9002 : ignoring unknown option '-mf16c'
cl : Command line warning D9002 : ignoring unknown option '-msse3'
cl : Command line warning D9002 : ignoring unknown option '-mssse3'
cl : Command line warning D9002 : ignoring unknown option '-fPIC'
ggml-cuda.cu
C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include\R_ext/Complex.h(81): error: expected a ";"
      double _Complex private_data_c;
                      ^

1 error detected in the compilation of "whisper_cpp/ggml-cuda.cu".
ggml-cuda.cu
make: *** [Makevars:298: whisper_cpp/ggml-cuda.o] Error 2
ERROR: compilation failed for package 'audio.whisper'
* removing 'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/audio.whisper'
* restoring previous 'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/audio.whisper'
Warning message:
In i.p(...) :
  installation of package ‘C:/Users/j553g371/AppData/Local/Temp/RtmpMpvn9d/file14a44216677c/audio.whisper_0.3.2.tar.gz’ had non-zero exit status

jwijffels · 2024-02-19T22:53:49Z

Yep, got similar errors in run https://github.com/bnosac/audio.whisper/actions/runs/7965676963/job/21745662097

Trace for cuda 12.2.0

nvcc --forward-unknown-to-host-compiler -arch=all -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -fPIC -IC:/R/include -c whisper_cpp/ggml-cuda.cu -o whisper_cpp/ggml-cuda.o
cl : Command line warning D9002 : ignoring unknown option '-mavx'
cl : Command line warning D9002 : ignoring unknown option '-mavx2'
cl : Command line warning D9002 : ignoring unknown option '-mfma'
cl : Command line warning D9002 : ignoring unknown option '-mf16c'
cl : Command line warning D9002 : ignoring unknown option '-msse3'
cl : Command line warning D9002 : ignoring unknown option '-mssse3'
cl : Command line warning D9002 : ignoring unknown option '-fPIC'
ggml-cuda.cu
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\include\cuda_fp16.hpp(65): fatal error C1083: Cannot open include file: 'nv/target': No such file or directory
ggml-cuda.cu

Trace for cuda 11.8.0

nvcc --forward-unknown-to-host-compiler -arch=all -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8/include" -I"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -fPIC -IC:/R/include -c whisper_cpp/ggml-cuda.cu -o whisper_cpp/ggml-cuda.o
cl : Command line warning D9002 : ignoring unknown option '-mavx'
cl : Command line warning D9002 : ignoring unknown option '-mavx2'
cl : Command line warning D9002 : ignoring unknown option '-mfma'
cl : Command line warning D9002 : ignoring unknown option '-mf16c'
cl : Command line warning D9002 : ignoring unknown option '-msse3'
cl : Command line warning D9002 : ignoring unknown option '-mssse3'
cl : Command line warning D9002 : ignoring unknown option '-fPIC'
ggml-cuda.cu
C:/R/include\R_ext/Complex.h(81): error: expected a ";"

1 error detected in the compilation of "whisper_cpp/ggml-cuda.cu".
nvcc warning : The 'compute_35', 'compute_37', 'sm_35', and 'sm_37' architectures are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
ggml-cuda.cu
make: *** [Makevars:298: whisper_cpp/ggml-cuda.o] Error 1

These R_ext/Complex.h for cuda 11.8.0 are because I made sure printing goes to the R console instead of std::err
2 options come to mind

Maybe we need to make sure the nvcc nvidia cc compiler uses mingw (https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#file-and-path-specifications)
Or let me disable these priting to the R console and use std

jwijffels · 2024-02-20T13:17:52Z

When disabling printing as indicated at #27 (comment)
The linker complains about missing culibos en missing rt for the Windows build (log at https://github.com/bnosac/audio.whisper/actions/runs/7971359508/job/21760868576)
This seems to be same as https://forums.developer.nvidia.com/t/missing-lib-files-culibos-dl-rt-for-cublas/276707/2 (where the NVIDIA moderator also mentions that mingw is not a host compiler you can use on windows for CUDA - so that leaves only option 2.)

g++ -shared -s -static-libgcc -o audio.whisper.dll tmp.def whisper_cpp/ggml-quants.o whisper_cpp/ggml-backend.o whisper_cpp/ggml-alloc.o whisper_cpp/ggml.o whisper_cpp/whisper.o whisper_cpp/common-ggml.o whisper_cpp/common.o rcpp_whisper.o RcppExports.o whisper_cpp/ggml-cuda.o -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -LC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8/lib/x64 -LC:/rtools43/x86_64-w64-mingw32.static.posix/lib/x64 -LC:/rtools43/x86_64-w64-mingw32.static.posix/lib -LC:/R/bin/x64 -lR
C:\rtools43\x86_64-w64-mingw32.static.posix\bin/ld.exe: cannot find -lculibos: No such file or directory
C:\rtools43\x86_64-w64-mingw32.static.posix\bin/ld.exe: cannot find -lrt: No such file or directory

jwijffels · 2024-02-29T15:44:15Z

Notes when spinning up a p3.2xlarge (Tesla V100, 16GB GPU RAM - NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB]) on AWS.

ubuntu@ip-172-31-39-81:~$ nvidia-smi
Thu Feb 29 15:04:44 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14              Driver Version: 550.54.14      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla V100-SXM2-16GB           Off |   00000000:00:1E.0 Off |                    0 |
| N/A   30C    P0             23W /  300W |       0MiB /  16384MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

ubuntu@ip-172-31-39-81:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Wed_Nov_22_10:17:15_PST_2023
Cuda compilation tools, release 12.3, V12.3.107
Build cuda_12.3.r12.3/compiler.33567101_0

Install the package - provide the path to cuda in environment variable CUDA_PATH and indicate to use CUDA with environment variable WHISPER_CUBLAS

Sys.setenv(CUDA_PATH = "/usr/local/cuda-12.3")
Sys.setenv(WHISPER_CUBLAS = "1")
remotes::install_github("bnosac/audio.whisper@cuda", force = TRUE)

> Sys.setenv(CUDA_PATH = "/usr/local/cuda-12.3")
> 
> Sys.setenv(WHISPER_CUBLAS = "1")
>
> remotes::install_github("bnosac/audio.whisper@cuda", force = TRUE)

Downloading GitHub repo bnosac/audio.whisper@cuda
Running `R CMD build`...
* checking for file ‘/tmp/Rtmp6kBKov/remotes11f0c55856be5/bnosac-audio.whisper-9f30a55/DESCRIPTION’ ... OK
* preparing ‘audio.whisper’:
* checking DESCRIPTION meta-information ... OK
* cleaning src
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building ‘audio.whisper_0.3.2.tar.gz’
Installing package into ‘/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.3’
(as ‘lib’ is unspecified)
* installing *source* package ‘audio.whisper’ ...
** using staged installation
** libs
using C++ compiler: ‘g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0’
using C++11
I whisper.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I PKG_CFLAGS:   -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread
I PKG_CPPFLAGS: -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread
I PKG_LIBS:  -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L"/usr/local/cuda-12.3/lib64" -L/opt/cuda/lib64 -L"/usr/local/cuda-12.3/targets/x86_64-linux/lib"

gcc -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.3/Rcpp/include'    -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fpic  -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c whisper_cpp/ggml-quants.c -o whisper_cpp/ggml-quants.o
gcc -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.3/Rcpp/include'    -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fpic  -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c whisper_cpp/ggml-backend.c -o whisper_cpp/ggml-backend.o
gcc -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.3/Rcpp/include'    -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fpic  -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c whisper_cpp/ggml-alloc.c -o whisper_cpp/ggml-alloc.o
gcc -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.3/Rcpp/include'    -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fpic  -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c whisper_cpp/ggml.c -o whisper_cpp/ggml.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.3/Rcpp/include'     -fpic  -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c whisper_cpp/whisper.cpp -o whisper_cpp/whisper.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.3/Rcpp/include'     -fpic  -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c whisper_cpp/common-ggml.cpp -o whisper_cpp/common-ggml.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.3/Rcpp/include'     -fpic  -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c whisper_cpp/common.cpp -o whisper_cpp/common.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.3/Rcpp/include'     -fpic  -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c rcpp_whisper.cpp -o rcpp_whisper.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/home/ubuntu/R/x86_64-pc-linux-gnu-library/4.3/Rcpp/include'     -fpic  -g -O2 -ffile-prefix-map=/build/r-base-H0vbME/r-base-4.3.2=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c RcppExports.cpp -o RcppExports.o
nvcc --forward-unknown-to-host-compiler -arch=all -O3 -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.3/include" -I"/usr/local/cuda-12.3/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fPIC -I/usr/lib/R/include -c whisper_cpp/ggml-cuda.cu -o whisper_cpp/ggml-cuda.o
g++ -std=gnu++11 -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro -o audio.whisper.so whisper_cpp/ggml-quants.o whisper_cpp/ggml-backend.o whisper_cpp/ggml-alloc.o whisper_cpp/ggml.o whisper_cpp/whisper.o whisper_cpp/common-ggml.o whisper_cpp/common.o rcpp_whisper.o RcppExports.o whisper_cpp/ggml-cuda.o -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda-12.3/lib64 -L/opt/cuda/lib64 -L/usr/local/cuda-12.3/targets/x86_64-linux/lib -L/usr/lib/R/lib -lR
installing to /home/ubuntu/R/x86_64-pc-linux-gnu-library/4.3/00LOCK-audio.whisper/00new/audio.whisper/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (audio.whisper)

Transcribe an example file of exactly 5 minutes using whisper medium, use_gpu = TRUE

> library(audio.whisper)
> 
> model <- whisper("medium", use_gpu = TRUE)

whisper_init_from_file_with_params_no_state: loading model from '/home/ubuntu/whisper.cpp/ggml-medium.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head  = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1024
whisper_model_load: n_text_head   = 16
whisper_model_load: n_text_layer  = 24
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 4 (medium)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: Tesla V100-SXM2-16GB, compute capability 7.0, VMM: yes
whisper_backend_init: using CUDA backend
whisper_model_load:     CUDA buffer size =  1533.52 MB
whisper_model_load: model size    = 1533.14 MB
whisper_backend_init: using CUDA backend
whisper_init_state: kv self size  =  132.12 MB
whisper_init_state: kv cross size =  147.46 MB
whisper_init_state: compute buffer (conv)   =   25.61 MB
whisper_init_state: compute buffer (encode) =  170.28 MB
whisper_init_state: compute buffer (cross)  =    7.85 MB
whisper_init_state: compute buffer (decode) =   98.32 MB
>
> trans <- predict(model, newdata = "example.wav", language = "en", n_threads = 4)

system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 |
Processing example.wav (4802560 samples, 300.16 sec), lang = en, translate = 0, timestamps = 0, beam_size = -1, best_of = 5
> 
> trans$data
   segment         from           to
1        1 00:00:00.000 00:00:02.000
2        2 00:00:02.000 00:00:21.000
3        3 00:00:21.000 00:00:30.000
4        4 00:00:30.000 00:00:40.000
5        5 00:00:40.000 00:00:54.000
6        6 00:00:54.000 00:01:10.000
7        7 00:01:10.000 00:01:18.000
8        8 00:01:18.000 00:01:37.000
9        9 00:01:37.000 00:02:00.000
10      10 00:02:00.000 00:02:20.000
11      11 00:02:20.000 00:02:36.000
12      12 00:02:36.000 00:02:49.000
13      13 00:02:49.000 00:03:12.000
14      14 00:03:12.000 00:03:30.000
15      15 00:03:30.000 00:03:51.000
16      16 00:03:51.000 00:04:10.000
17      17 00:04:10.000 00:04:32.000
18      18 00:04:32.000 00:04:52.000
19      19 00:04:52.000 00:05:00.000

                                                                                                           text
1
                                                                                           Look at the picture.
2                                                    See the skull, the part of bone removed, the master race Frankenstein radio controls, the brain thoughts broadcasting radio, the eyesight television, the Frankenstein earphone radio, the threshold brainwash radio, the latest new skull reforming to contain all Frankenstein controls.
3                                                                                                                                                                                                               Even in thin skulls of white pedigree males. Visible Frankenstein controls, the synthetic nerve radio directional antenna loop.
4                                                                                                                                                                                                    Make copies for yourself. There is no escape from this worse gangster police state using all of the deadly gangster Frankenstein controls.
5                                                                                                                                         In 1965 CIA gangster police beat me bloody, dragged me in chains from Kennedy New York airport. Since then I hide, enforce jobless poverty, isolated, alone in this low deadly nigger town old house.
6                                                                     The brazen, deadly gangster police and nigger puppet underlings spray me with poison nerve gas from automobile exhausts and even lawn mowers. Deadly assaults even in my yard with knives, even bricks and stones, even deadly touch table or electric shock flashlights.
7                                                                                                                                                                                                          Even remote electronically controlled around corners projection of deadly touch tarantulas fighters or even bloody murder accidents.
8                                                   To shut me up forever with a sneak undetectable extermination even with trained parroting puppet assassins in maximum security insanity prisons for writing these unforgivable truths until my undetectable extermination I, Francis E. Deck Esquire, 29 Maple Avenue, Hempstead, New York.
9   I stand alone against your mad, deadly, worldwide, conspiratorial, gangster, computer god communism with wall to wall deadly gangster protection, lifelong sworn conspirators, murder incorporated organized crime, the police and judges, the deadly sneak parroting puppet gangsters using all the gangster deadly Frankenstein controls.
10                                     These hangman rope sneak deadly gangsters, the judges and the police trick, trap, rob, wreck, butcher and murder the people to keep them terrorized in gangster Frankenstein earphone radio slavery for the communist gangster government and con artists parroting puppet gangster playboy scum on top.
11                                                                                                  The secret work of all police in order to maintain a communist closed society, the same worldwide mad deadly communist gangster computer god that controls you as a terrorized gangster Frankenstein earphone radio slave parroting puppet.
12                                                                                                                                                                                    You are a terrorized member of the master 
race worldwide, four billion eyesight television camera guinea pig communist gangster computer god master race.
13     You're living thinking mad deadly worldwide communist gangster computer god secret overall plan worldwide living death Frankenstein slavery to explore and control the entire universe with the endless stairway to the stars, namely the man made inside out planets with nucleonic powered speeds much faster than the speed of light.
14                                                  Look up and see the gangster computer god concocted new fake starry sky, the worldwide completely controlled deadly degenerative climate and atmosphere to the new world round translucent exotic gaseous envelope, which the worldwide communist gangster computer god manipulates through
15                                  the countless exactly positioned satellites, the new fake phony stars in the synthetic sky for ages before Frankenstein controls a poetic niggers interbreedable with eight had no alphabet 
not even numerals slavery conspiracy over 300 years ago, ideally tiny brain a poetic nigger gangster government
16                                                     TV gangster spy cameras computer god new world order degeneration with gifted with all gangster Frankenstein controls nigger deadly gangster parroting puppets or nigger 
brain programmed robots deadly eight Frankenstein machines degenerative disease to eternal Frankenstein slavery
17                 overall plan through one world communism top secret code word, meaning worldwide absolutely helpless and hopeless simple language mongrel mulatto a poetic niggers worldwide systematic instant plastic surgery butchery murder fake aging so all people are dead or useless by eight 70 done at night to use a Frankenstein
18                                              slave parroting puppet gangster slave now even you know, I am a menace to your worldwide mad deadly communist gangster computer god. Therefore, I must go to extermination before I am exterminated by this gangster computer god concocted and controlled worst mongrel organized crime murder
19
 gangster communist government. I hand you the secrets to save the entire human race and the entire human race.
>
> trans$timing
$transcription_start
[1] "2024-02-29 16:47:09 UTC"

$transcription_end
[1] "2024-02-29 16:47:24 UTC"

$transcription_duration
Time difference of 0.2399666 mins

Don't forgot to set use_gpu = TRUE

Note to future self: installed cuda and nvidia drivers as follows on Ubuntu 22.04

## Install R, gcc compiler + dependencies
sudo apt update -qq && sudo apt install -y ubuntu-drivers-common
#sudo apt -y upgrade
sudo apt install --no-install-recommends software-properties-common dirmngr
wget -qO- https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc | sudo tee -a /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc
sudo add-apt-repository "deb https://cloud.r-project.org/bin/linux/ubuntu $(lsb_release -cs)-cran40/"
sudo apt install -y r-base 
## Install NVIDIA cuda-12-3 (on Tesla V100)
lspci | grep -i NVIDIA
sudo apt-get install linux-headers-$(uname -r)
sudo apt-key del 7fa2af80
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get install -y cuda
sudo apt-get install -y nvidia-gds
sudo apt-get install -y cuda-toolkit cuda-drivers
sudo reboot
## Set environment variables 12.3.2-base-ubuntu22.04
echo $PATH
echo $LD_LIBRARY_PATH
export PATH=/usr/local/cuda-12.3/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-12.3/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDA_PATH=/usr/local/cuda-12.3

jwijffels · 2024-02-29T15:47:31Z

quick check to see if whisper.cpp provides the same information

ubuntu@ip-172-31-39-81:~/whisper.cpp$ WHISPER_CUBLAS=1 make -j

I whisper.cpp build info:
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include
I LDFLAGS:  -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib -L/usr/lib/wsl/lib
I CC:       cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
I CXX:      g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

nvcc --forward-unknown-to-host-compiler -arch=native -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -Wno-pedantic -c ggml-cuda.cu -o ggml-cuda.o
cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include   -c ggml.c -o ggml.o
cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include   -c ggml-alloc.c -o ggml-alloc.o
cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include   -c ggml-backend.c -o ggml-backend.o
cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include   -c ggml-quants.c -o ggml-quants.o
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -c whisper.cpp -o whisper.o
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include examples/main/main.cpp examples/common.cpp examples/common-ggml.cpp ggml-cuda.o ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o -o main -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib -L/usr/lib/wsl/lib
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include examples/bench/bench.cpp ggml-cuda.o ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o -o bench -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib -L/usr/lib/wsl/lib
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include examples/quantize/quantize.cpp examples/common.cpp examples/common-ggml.cpp ggml-cuda.o ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o -o quantize -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib -L/usr/lib/wsl/lib
g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -mavx -mavx2 -mfma -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include examples/server/server.cpp examples/common.cpp examples/common-ggml.cpp ggml-cuda.o ggml.o ggml-alloc.o ggml-backend.o ggml-quants.o whisper.o -o server -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib -L/usr/lib/wsl/lib
./main -h

Same timing with whisper.cpp

ubuntu@ip-172-31-39-81:~/whisper.cpp$ ./main -m ggml-medium.bin --threads 4 --language en -f example.wav 

whisper_init_from_file_with_params_no_state: loading model from 'ggml-medium.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head  = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1024
whisper_model_load: n_text_head   = 16
whisper_model_load: n_text_layer  = 24
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 4 (medium)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: Tesla V100-SXM2-16GB, compute capability 7.0, VMM: yes
whisper_backend_init: using CUDA backend
whisper_model_load:    CUDA0 total size =  1533.14 MB
whisper_model_load: model size    = 1533.14 MB
whisper_backend_init: using CUDA backend
whisper_init_state: kv self size  =  132.12 MB
whisper_init_state: kv cross size =  147.46 MB
whisper_init_state: compute buffer (conv)   =   28.68 MB
whisper_init_state: compute buffer (encode) =  594.22 MB
whisper_init_state: compute buffer (cross)  =    7.85 MB
whisper_init_state: compute buffer (decode) =  138.87 MB

system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 |

main: processing 'example.wav' (4802560 samples, 300.2 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:02.560]   Look at the picture.
[00:00:02.560 --> 00:00:06.120]   See the skull, the part of bone removed.
[00:00:06.120 --> 00:00:09.120]   The master race Frankenstein radio controls.
[00:00:09.120 --> 00:00:11.200]   The brain thoughts broadcasting radio.
[00:00:11.200 --> 00:00:12.600]   The eyesight television.
[00:00:12.600 --> 00:00:14.480]   The Frankenstein earphone radio.
[00:00:14.480 --> 00:00:16.440]   The threshold brainwash radio.
[00:00:16.440 --> 00:00:20.680]   The latest new skull reforming to contain all Frankenstein
[00:00:20.680 --> 00:00:25.000]   controls, even in thin skulls of white pedigree males.
[00:00:25.000 --> 00:00:27.240]   Visible Frankenstein controls.
[00:00:27.240 --> 00:00:30.600]   The synthetic nerve radio directional antenna loop.
[00:00:30.600 --> 00:00:32.240]   Make copies for yourself.
[00:00:32.240 --> 00:00:37.000]   There is no escape from this worse gangster police state
[00:00:37.000 --> 00:00:40.320]   using all of the deadly gangster Frankenstein controls.
[00:00:40.320 --> 00:00:44.600]   In 1965, CIA gangster police beat me bloody,
[00:00:44.600 --> 00:00:47.360]   dragged me in chains from Kennedy New York Airport.
[00:00:47.360 --> 00:00:51.120]   Since then, I hide in forced jobless poverty, isolated,
[00:00:51.120 --> 00:00:54.440]   alone in this low deadly nigger town old house.
[00:00:54.440 --> 00:00:57.560]   The brazen, deadly gangster police and nigger puppet
[00:00:57.560 --> 00:00:59.640]   underlings spray me with poison nerve
[00:00:59.640 --> 00:01:02.720]   gas from automobile exhausts and even lawnmowers.
[00:01:02.720 --> 00:01:06.480]   Deadly assaults even in my yard with knives, even bricks
[00:01:06.480 --> 00:01:09.600]   and stones, even deadly touch pavement or electric shock
[00:01:09.600 --> 00:01:12.480]   flashlights, even remote electronically controlled
[00:01:12.480 --> 00:01:14.720]   around corners projection of deadly touch
[00:01:14.720 --> 00:01:18.760]   tarantulas fighters, or even bloody murder accidents
[00:01:18.760 --> 00:01:21.080]   to shut me up forever with a sneak undetectable
[00:01:21.080 --> 00:01:24.600]   extermination, even with trained parroting puppet assassins
[00:01:24.600 --> 00:01:27.560]   in maximum security insanity prisons for writing
[00:01:27.560 --> 00:01:32.000]   these unforgivable truths until my undetectable extermination.
[00:01:32.000 --> 00:01:37.080]   I, Francis E. Deck Esquire, 29 Maple Avenue, Hempstead, New
[00:01:37.080 --> 00:01:41.880]   York, I stand alone against your mad, deadly, worldwide
[00:01:41.880 --> 00:01:45.560]   conspiratorial gangster computer god communism
[00:01:45.560 --> 00:01:48.760]   with wall to wall deadly gangster protection,
[00:01:48.760 --> 00:01:51.520]   lifelong sworn conspirators, murder
[00:01:51.520 --> 00:01:54.680]   incorporated organized crime, the police and judges,
[00:01:54.680 --> 00:01:57.920]   the deadly sneak parroting puppet gangsters
[00:01:57.920 --> 00:02:00.480]   using all the gangster deadly Frankenstein control.
[00:02:00.480 --> 00:02:03.960]   These hangman rope sneak deadly gangsters, the judges
[00:02:03.960 --> 00:02:08.720]   and the police trick, trap, rob, wreck, butcher, and murder
[00:02:08.720 --> 00:02:10.680]   the people that keep them terrorized
[00:02:10.680 --> 00:02:13.320]   in gangster Frankenstein earphone radio
[00:02:13.320 --> 00:02:16.120]   slavery for the communist gangster government
[00:02:16.120 --> 00:02:18.400]   and con artist parroting puppet gangster
[00:02:18.400 --> 00:02:20.560]   playboy scum on top.
[00:02:20.560 --> 00:02:23.200]   The secret work of all police in order
[00:02:23.200 --> 00:02:26.040]   to maintain a communist closed society,
[00:02:26.040 --> 00:02:29.920]   the same worldwide mad deadly communist gangster computer
[00:02:29.920 --> 00:02:33.360]   god that controls you as a terrorized gangster
[00:02:33.360 --> 00:02:36.720]   Frankenstein earphone radio slave parroting puppet.
[00:02:36.720 --> 00:02:42.040]   You are a terrorized member of the master race worldwide.
[00:02:42.040 --> 00:02:46.000]   Four billion eyesight television camera guinea pig communist
[00:02:46.000 --> 00:02:51.160]   gangster computer god master race, your living, thinking,
[00:02:51.160 --> 00:02:54.640]   mad deadly worldwide communist gangster computer god
[00:02:54.640 --> 00:02:59.120]   secret overall plan, worldwide living death Frankenstein
[00:02:59.120 --> 00:03:02.560]   slavery to explore and control the entire universe
[00:03:02.560 --> 00:03:04.960]   with the endless stairway to the stars,
[00:03:04.960 --> 00:03:07.960]   namely the man-made inside out planets
[00:03:07.960 --> 00:03:10.400]   with nucleonic powered speeds much faster
[00:03:10.400 --> 00:03:11.880]   than the speed of light.
[00:03:11.880 --> 00:03:14.960]   Look up and see the gangster computer god concocted
[00:03:14.960 --> 00:03:18.680]   new fake starry sky, the worldwide completely controlled
[00:03:18.680 --> 00:03:21.040]   deadly degenerative climate and atmosphere
[00:03:21.040 --> 00:03:24.680]   through the new world round translucent exotic gaseous
[00:03:24.680 --> 00:03:28.160]   envelope, which the worldwide communist gangster computer god
[00:03:28.160 --> 00:03:32.240]   manipulates through countless exactly positioned satellites,
[00:03:32.240 --> 00:03:36.720]   the new fake phony stars in the synthetic sky.
[00:03:36.720 --> 00:03:39.240]   For ages before Frankenstein controls
[00:03:39.240 --> 00:03:42.600]   apoetic niggers interbreedable with apes had no alphabet,
[00:03:42.600 --> 00:03:47.080]   not even numerals, slavery conspiracy over 300 years ago,
[00:03:47.080 --> 00:03:50.480]   ideally tiny brain apoetic nigger gangster government
[00:03:50.480 --> 00:03:54.560]   eyesight TV gangster spy cameras, computer god new world order
[00:03:54.560 --> 00:03:59.280]   degeneration gifted with all gangster Frankenstein controls
[00:03:59.280 --> 00:04:02.200]   nigger deadly gangster parroting puppets or nigger brain
[00:04:02.200 --> 00:04:05.640]   programmed robots, deadly ape Frankenstein machines,
[00:04:05.640 --> 00:04:09.720]   degenerative disease to eternal Frankenstein slavery,
[00:04:09.720 --> 00:04:14.280]   overall plan through one world communism, top secret code word,
[00:04:14.280 --> 00:04:17.240]   meaning worldwide absolutely helpless and hopeless
[00:04:17.240 --> 00:04:21.280]   simple language mongrel mulatto apoetic niggers.
[00:04:21.280 --> 00:04:25.120]   Worldwide systematic instant plastic surgery butchery murder,
[00:04:25.120 --> 00:04:29.560]   fake aging so all people are dead or useless by age 70,
[00:04:29.560 --> 00:04:32.440]   done at night to you as a Frankenstein slave
[00:04:32.440 --> 00:04:34.120]   parroting puppet gangster slave.
[00:04:34.120 --> 00:04:39.120]   Now even you know I am a menace to your worldwide mad
[00:04:39.120 --> 00:04:41.560]   deadly communist gangster computer god,
[00:04:41.560 --> 00:04:44.000]   therefore I must go to extermination.
[00:04:44.000 --> 00:04:47.480]   Before I am exterminated by this gangster computer god
[00:04:47.480 --> 00:04:51.240]   concocted and controlled, worst mongrel organized crime murder
[00:04:51.240 --> 00:04:53.840]   incorporated gangster communist government,
[00:04:53.840 --> 00:04:59.040]   I hand you the secrets to save the entire human race
[00:04:59.040 --> 00:05:00.040]   and the entire world.


whisper_print_timings:     load time =   913.76 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =   399.31 ms
whisper_print_timings:   sample time =  3201.01 ms /  5230 runs (    0.61 ms per run)
whisper_print_timings:   encode time =   305.30 ms /    11 runs (   27.75 ms per run)
whisper_print_timings:   decode time =    68.82 ms /    11 runs (    6.26 ms per run)
whisper_print_timings:   batchd time =  9215.10 ms /  5167 runs (    1.78 ms per run)
whisper_print_timings:   prompt time =   874.47 ms /  2122 runs (    0.41 ms per run)
whisper_print_timings:    total time = 15031.71 ms

jwijffels · 2024-02-29T16:51:36Z

So conclusion: Tesla V100, 16GB GPU RAM.

R wrapper: 5 minute audio: 0.2399666 mins = 0.2399666 * 60 * 1000 = 14398 milliseconds
whisper.cpp reports: 15031 - 913.76 = 14117 milliseconds

So CUDA with R works on this Linux machine

jwijffels · 2024-03-01T09:26:28Z

@jmgirard I think I'll include already the changes that allow to install and transcribe on Linux with CUDA as that works.

Will try later to see if we can make it work on Windows. But apparently it needs to link to culibos and rt (see #27 (comment)) and I don't know if that is uberhaupt available when you install the CUDA drivers on Windows. At least it is not available on the continuous integration run under C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8/lib/x64 (maybe it is at your machine?)

jmgirard · 2024-03-02T15:35:54Z

> list.files("C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3/lib/x64")
 [1] "cublas.lib"                "cublasLt.lib"             
 [3] "cuda.lib"                  "cudadevrt.lib"            
 [5] "cudart.lib"                "cudart_static.lib"        
 [7] "cufft.lib"                 "cufftw.lib"               
 [9] "cufilt.lib"                "curand.lib"               
[11] "cusolver.lib"              "cusolverMg.lib"           
[13] "cusparse.lib"              "nppc.lib"                 
[15] "nppial.lib"                "nppicc.lib"               
[17] "nppidei.lib"               "nppif.lib"                
[19] "nppig.lib"                 "nppim.lib"                
[21] "nppist.lib"                "nppisu.lib"               
[23] "nppitc.lib"                "npps.lib"                 
[25] "nvblas.lib"                "nvJitLink.lib"            
[27] "nvJitLink_static.lib"      "nvjpeg.lib"               
[29] "nvml.lib"                  "nvptxcompiler_static.lib" 
[31] "nvrtc-builtins_static.lib" "nvrtc.lib"                
[33] "nvrtc_static.lib"          "OpenCL.lib"

jwijffels · 2024-03-02T16:06:02Z

I've enabled CUDA integration on the master branch for Linux.

jmgirard · 2024-03-02T16:08:04Z

I've enabled CUDA integration on the master branch for Linux.

I wonder if this would work on Windows via WSL2?

jwijffels · 2024-03-02T16:12:01Z

I've enabled CUDA integration on the master branch for Linux.

I wonder if this would work on Windows via WSL2?

culibos and rt is clearly not on your machine

The relevant part of the compilation is here: https://github.com/bnosac/audio.whisper/blob/master/src/Makevars#L152-L161

I looked to the latest changes in the Makevars on whisper.cpp and they link to -L/usr/lib/wsl/lib, I've added that on the master branch. Maybe that allows to run it at WSL2. Would be cool if you could test that. I've added the installations which I did on that Tesla V100 machine on AWS at the end of #27 (comment)

jmgirard · 2024-03-02T22:15:59Z

I'm not sure I have admin permissions required to enable WSL on my work computer (unfortunately my home PC does not have an RTX) but I will try and let you know.

jmgirard · 2024-03-05T23:09:34Z

Ok, I was able to get WSL going on my work computer. I installed Ubuntu Jammy Jellyfish.

Install NVIDIA Graphics Driver on Windows via link (but not the CUDA toolkit)
Install "Windows Subsystem for Linux" from Microsoft Store
Open CMD/PowerShell/Terminal and update WSL via wsl --update
Install Ubuntu via wsl --install Ubuntu
Set up Ubuntu user/password and start Ubuntu (e.g., via wsl if necessary)
Install CUDA Toolkit for WSL-Ubuntu via:

wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda-repo-wsl-ubuntu-12-4-local_12.4.0-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-12-4-local_12.4.0-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-4

Install R on Ubuntu via:

wget -qO- https://cloud.r-project.org/bin/linux/ubuntu/marutter_pubkey.asc | sudo tee -a /etc/apt/trusted.gpg.d/cran_ubuntu_key.asc
sudo add-apt-repository "deb https://cloud.r-project.org/bin/linux/ubuntu $(lsb_release -cs)-cran40/"
sudo apt install --no-install-recommends r-base

Install Packages Used in Many R Packages via:

sudo apt-get install libcurl4-openssl-dev libssl-dev libxml2-dev libudunits2-dev libgdal-dev cargo libfontconfig1-dev libcairo2-dev

Install Devtools from c2d4u4.0+ via:

sudo add-apt-repository ppa:c2d4u.team/c2d4u4.0+
sudo apt upgrade
sudo apt install --no-install-recommends r-cran-devtools

Add CUDA Toolkit to PATH via:
a) Open .bashrc for editing via nano /home/$USER/.bashrc
b) Move to bottom with arrow keys and add:
export PATH="/usr/local/cuda-12.4/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH"
c) Save changes with CTRL+O, then ENTER
d) Exit file editing with CTRL+X
e) Restart Ubuntu via sudo reboot
f) Enter Ubuntu again with wsl
g) Check that nvcc can be found via which nvcc

-- https://docs.nvidia.com/cuda/wsl-user-guide/index.html
-- https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=WSL-Ubuntu&target_version=2.0&target_type=deb_local
-- https://askubuntu.com/questions/885610/nvcc-version-command-says-nvcc-is-not-installed

jmgirard · 2024-03-05T23:13:39Z

Now to try to install audio.whisper:

Start R within Ubuntu via sudo R
Configure CUDA_PATH via Sys.setenv(CUDA_PATH = "/usr/local/cuda-12.4")
Configure WHISPER_CUBLAS via Sys.setenv(WHISPER_CUBLAS = "1")
Force install cuda branch via remotes::install_github("bnosac/audio.whisper@cuda", force = TRUE)

Trace:

> Sys.setenv(CUDA_PATH = "/usr/local/cuda-12.4")
> Sys.setenv(WHISPER_CUBLAS = "1")
> remotes::install_github("bnosac/audio.whisper@cuda", force = TRUE)
Downloading GitHub repo bnosac/audio.whisper@cuda
── R CMD build ───────────────────────────────────────────────────────────────────────────────────────────────────────
✔  checking for file ‘/tmp/Rtmp6D5gcu/remotes4335ffaea6f/bnosac-audio.whisper-9f30a55/DESCRIPTION’ ...
─  preparing ‘audio.whisper’:
✔  checking DESCRIPTION meta-information ...
─  cleaning src
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  building ‘audio.whisper_0.3.2.tar.gz’
   
Installing package into ‘/home/jmgirard/R/x86_64-pc-linux-gnu-library/4.3’
(as ‘lib’ is unspecified)
* installing *source* package ‘audio.whisper’ ...
** using staged installation
** libs
expr: syntax error: unexpected argument ‘11.6’
using C++ compiler: ‘g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0’
using C++11
expr: syntax error: unexpected argument ‘11.6’
I whisper.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I PKG_CFLAGS:   -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread
I PKG_CPPFLAGS: -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread
I PKG_LIBS:  -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L"/usr/local/cuda-12.4/lib64" -L/opt/cuda/lib64 -L"/usr/local/cuda-12.4/targets/x86_64-linux/lib"

gcc -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include'    -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fpic  -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c whisper_cpp/ggml-quants.c -o whisper_cpp/ggml-quants.o
gcc -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include'    -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fpic  -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c whisper_cpp/ggml-backend.c -o whisper_cpp/ggml-backend.o
gcc -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include'    -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fpic  -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c whisper_cpp/ggml-alloc.c -o whisper_cpp/ggml-alloc.o
gcc -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include'    -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fpic  -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c whisper_cpp/ggml.c -o whisper_cpp/ggml.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include'     -fpic  -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c whisper_cpp/whisper.cpp -o whisper_cpp/whisper.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include'     -fpic  -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c whisper_cpp/common-ggml.cpp -o whisper_cpp/common-ggml.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include'     -fpic  -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c whisper_cpp/common.cpp -o whisper_cpp/common.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include'     -fpic  -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c rcpp_whisper.cpp -o rcpp_whisper.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include'     -fpic  -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c RcppExports.cpp -o RcppExports.o
nvcc --forward-unknown-to-host-compiler -arch=all -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fPIC -I/usr/lib/R/include -c whisper_cpp/ggml-cuda.cu -o whisper_cpp/ggml-cuda.o
/bin/bash: line 1: nvcc: command not found
make: *** [Makevars:298: whisper_cpp/ggml-cuda.o] Error 127
ERROR: compilation failed for package ‘audio.whisper’
* removing ‘/home/jmgirard/R/x86_64-pc-linux-gnu-library/4.3/audio.whisper’
Warning message:
In i.p(...) :
  installation of package ‘/tmp/Rtmp6D5gcu/file43368c88092/audio.whisper_0.3.2.tar.gz’ had non-zero exit status

Not sure why it isn't finding nvcc... I get this back from the Terminal:

jmgirard@PSYC-7PBFM02:~$ which nvcc
/usr/local/cuda-12.4/bin/nvcc

jmgirard@PSYC-7PBFM02:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:19:38_PST_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0

jwijffels · 2024-03-06T08:26:36Z

Brave 👍 going down the rabbit hole of installing NVIDIA drivers and CUDA.

I've enabled CUDA integration on Linux on the master branch or since version 0.3.2 so remotes::install_github("bnosac/audio.whisper", ref = "0.3.2", force = TRUE) should work or remotes::install_github("bnosac/audio.whisper", force = TRUE) (I plan te remove branch cuda later on)
Strange that it does not find nvcc. Can you make sure it is really in the PATH.
- Inspect Sys.getenv("PATH") - can you show me what is in there?

# If nvcc is not in the path you could make sure from R it is there.
Sys.setenv(PATH = sprintf("%s:/usr/local/cuda-12.4/bin", Sys.getenv("PATH")))
Sys.setenv(CUDA_PATH = "/usr/local/cuda-12.4")
Sys.setenv(WHISPER_CUBLAS = "1")
remotes::install_github("bnosac/audio.whisper", force = TRUE)

jmgirard · 2024-03-06T14:59:38Z

I guess it's not really in path:

> Sys.getenv("PATH")
[1] "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:/usr/lib/rstudio-server/bin/quarto/bin:/usr/lib/rstudio-server/bin/postback:/usr/lib/rstudio-server/bin/postback"

Here is a successful (!) trace:

> Sys.setenv(PATH = sprintf("%s:/usr/local/cuda-12.4/bin", Sys.getenv("PATH")))
> Sys.setenv(CUDA_PATH = "/usr/local/cuda-12.4")
> Sys.setenv(WHISPER_CUBLAS = "1")
> remotes::install_github("bnosac/audio.whisper", ref = "0.3.2", force = TRUE)
Downloading GitHub repo bnosac/audio.whisper@0.3.2
── R CMD build ────────────────────────────────────────────────────────────────────────────────────────────────────────
✔  checking for file ‘/tmp/RtmpEPMVPm/remotes1f36f9bca10/bnosac-audio.whisper-8d57d02/DESCRIPTION’ ...
─  preparing ‘audio.whisper’:
✔  checking DESCRIPTION meta-information ...
─  cleaning src
─  checking for LF line-endings in source and make files and shell scripts
─  checking for empty or unneeded directories
─  building ‘audio.whisper_0.3.2.tar.gz’
   
Installing package into ‘/home/jmgirard/R/x86_64-pc-linux-gnu-library/4.3’
(as ‘lib’ is unspecified)
* installing *source* package ‘audio.whisper’ ...
** using staged installation
** libs
using C++ compiler: ‘g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0’
using C++11
I whisper.cpp build info: 
I UNAME_S:  Linux
I UNAME_P:  x86_64
I UNAME_M:  x86_64
I PKG_CFLAGS:   -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread
I PKG_CPPFLAGS: -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread
I PKG_LIBS:  -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L"/usr/local/cuda-12.4/lib64" -L/opt/cuda/lib64 -L"/usr/local/cuda-12.4/targets/x86_64-linux/lib" -L/usr/lib/wsl/lib

gcc -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include'    -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fpic  -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c whisper_cpp/ggml-quants.c -o whisper_cpp/ggml-quants.o
gcc -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include'    -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fpic  -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c whisper_cpp/ggml-backend.c -o whisper_cpp/ggml-backend.o
gcc -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include'    -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fpic  -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c whisper_cpp/ggml-alloc.c -o whisper_cpp/ggml-alloc.o
gcc -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include'    -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fpic  -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c whisper_cpp/ggml.c -o whisper_cpp/ggml.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include'     -fpic  -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c whisper_cpp/whisper.cpp -o whisper_cpp/whisper.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include'     -fpic  -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c whisper_cpp/common-ggml.cpp -o whisper_cpp/common-ggml.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include'     -fpic  -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c whisper_cpp/common.cpp -o whisper_cpp/common.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include'     -fpic  -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c rcpp_whisper.cpp -o rcpp_whisper.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include'     -fpic  -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2  -c RcppExports.cpp -o RcppExports.o
nvcc --forward-unknown-to-host-compiler -arch=native -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp  -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I"/usr/share/R/include" -fPIC -c whisper_cpp/ggml-cuda.cu -o whisper_cpp/ggml-cuda.o
g++ -std=gnu++11 -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro -o audio.whisper.so whisper_cpp/ggml-quants.o whisper_cpp/ggml-backend.o whisper_cpp/ggml-alloc.o whisper_cpp/ggml.o whisper_cpp/whisper.o whisper_cpp/common-ggml.o whisper_cpp/common.o rcpp_whisper.o RcppExports.o whisper_cpp/ggml-cuda.o -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda-12.4/lib64 -L/opt/cuda/lib64 -L/usr/local/cuda-12.4/targets/x86_64-linux/lib -L/usr/lib/wsl/lib -L/usr/lib/R/lib -lR
installing to /home/jmgirard/R/x86_64-pc-linux-gnu-library/4.3/00LOCK-audio.whisper/00new/audio.whisper/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (audio.whisper)

jmgirard · 2024-03-06T15:10:26Z

> Sys.setenv(PATH = sprintf("%s:/usr/local/cuda-12.4/bin", Sys.getenv("PATH")))
> Sys.setenv(CUDA_PATH = "/usr/local/cuda-12.4")
> Sys.setenv(WHISPER_CUBLAS = "1")
> remotes::install_github("bnosac/audio.whisper", ref = "0.3.2", force = TRUE)
> 
> library(av)
> download.file(url = "https://www.ubu.com/media/sound/dec_francis/Dec-Francis-E_rant1.mp3", destfile = "rant1.mp3", mode = "wb")
> av_audio_convert("rant1.mp3", output = "output.wav", format = "wav", sample_rate = 16000)
> 
> library(audio.whisper)
> model <- whisper("medium", use_gpu = TRUE)
> trans <- predict(model, newdata = "output.wav", language = "en", n_threads = 1)
> trans$timing
$transcription_start
[1] "2024-03-06 09:08:29 CST"

$transcription_end
[1] "2024-03-06 09:09:12 CST"

$transcription_duration
Time difference of 0.7129008 mins

jmgirard · 2024-03-06T15:31:39Z

0.71 min (Use GPU = TRUE, n_threads = 1)
17.71 min (Use GPU = FALSE, n_threads = 1)
0.71 min (Use GPU = TRUE, n_threads = 4)

jmgirard · 2024-03-06T16:21:41Z

For large-v3 with CUDA, I complete the above 0.94 min. But for large-v3-q5_0 with CUDA, I get 34.96 min. I thought the point of quantized models was to be faster/more efficient? Do I need to install something else (e.g., ONNX) to unlock this benefit of quantized models?

jwijffels · 2024-03-06T20:01:53Z

Good to see that CUDA works on WSL as well and thanks for listing up the steps exactly what you did.

I've asksed at whisper.cpp to see if I could directly compile it alongside the Rtools toolchain (see ggerganov/whisper.cpp#1922) but I think currently WSL seems to be the only way unless somehow we use cmake in the build process and rely on another compiler than R's default on Windows.

While you are timing things as well.

Which computer is that, which you are running this on (can you show the output of nvidia-smi)
Can you take the following audio file which is exactly 5 minutes (it's the rant from Francis E Dec where the silences were removed from and audio limited to 5 minutes) so that I can directly compare it to the timings printed out at Integrate CUDA support #27 (comment) which showed 0.2399666 mins on a Tesla V100, 16GB GPU RAM (no need to reïnstall things or provide environment variables from now on)

library(audio.whisper)
model <- whisper("medium", use_gpu = TRUE)
download.file("https://github.com/jwijffels/example/raw/main/example.wav", "example.wav")
trans <- predict(model, newdata = "example.wav", language = "en", n_threads = 4)
trans$timing

Maybe that question on quantised model speed improvement needs to be asked at whisper.cpp, I don't know the answer. Where did you find information on speed of transcriptions on onnx compiled code?
There are quite some arguments which you can pass on to the predict. These are basically passed on directly here https://github.com/bnosac/audio.whisper/blob/master/src/rcpp_whisper.cpp#L234-L244 - maybe n_processors can be used as well to speed up further

trans <- predict(model, newdata = "example.wav", language = "en", n_threads = 4, n_processors = 4)
trans$timing

jmgirard · 2024-03-06T20:51:34Z

I can respond to the rest later today, but here is a quick answer:

jmgirard@PSYC-7PBFM02:/mnt/c/Users/j553g371$ nvidia-smi
Wed Mar  6 14:50:23 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.60.01              Driver Version: 551.76         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050        On  |   00000000:02:00.0  On |                  N/A |
| 60%   44C    P8             15W /  130W |    7803MiB /   8192MiB |      4%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

jmgirard · 2024-03-06T22:22:50Z

> library(audio.whisper)
> model <- whisper("medium", use_gpu = TRUE)
whisper_init_from_file_with_params_no_state: loading model from '/home/jmgirard/ggml-medium.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head  = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1024
whisper_model_load: n_text_head   = 16
whisper_model_load: n_text_layer  = 24
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 4 (medium)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3050, compute capability 8.6, VMM: yes
whisper_backend_init: using CUDA backend
whisper_model_load:     CUDA buffer size =  1533.52 MB
whisper_model_load: model size    = 1533.14 MB
whisper_backend_init: using CUDA backend
whisper_init_state: kv self size  =  132.12 MB
whisper_init_state: kv cross size =  147.46 MB
whisper_init_state: compute buffer (conv)   =   25.61 MB
whisper_init_state: compute buffer (encode) =  170.28 MB
whisper_init_state: compute buffer (cross)  =    7.85 MB
whisper_init_state: compute buffer (decode) =   98.32 MB
> download.file("https://github.com/jwijffels/example/raw/main/example.wav", "example.wav")
trying URL 'https://github.com/jwijffels/example/raw/main/example.wav'
Content type 'application/octet-stream' length 9605198 bytes (9.2 MB)
==================================================
downloaded 9.2 MB

> trans <- predict(model, newdata = "example.wav", language = "en", n_threads = 4)
system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 | 
Processing example.wav (4802560 samples, 300.16 sec), lang = en, translate = 0, timestamps = 0, beam_size = -1, best_of = 5
> trans$timing
$transcription_start
[1] "2024-03-06 14:56:13 CST"

$transcription_end
[1] "2024-03-06 14:56:34 CST"

$transcription_duration
Time difference of 0.3399141 mins

jmgirard · 2024-03-06T23:40:38Z

> trans <- predict(
    model, 
    newdata = "example.wav", 
    language = "en", 
    n_threads = 4, 
    n_processors = 4
  )
Error: C stack usage  746289549788 is too close to the limit

jwijffels · 2024-03-07T08:37:45Z

yes, probably that should be n_threads = 8, n_processors = 1 or n_threads = 4, n_processors = 2, given that your system info shows system_info: n_threads = 4 / 8
But maybe this should be another github issue.
I'll probably close this issue as CUDA integration works, and I'll write some wrapup text here, in case someone else goes here.

jwijffels · 2024-03-08T08:33:31Z

Closing as CUDA integration is enabled on the master branch since audio.whisper version 0.3.2 and works for Linux and Windows Subsystem for Linux. As a wrapup:

To make sure it works, install cuda, the cuda toolkit and cuda-drivers for your system (visit the NVIDIA documentation). Examples given at Integrate CUDA support #27 (comment) (Ubuntu) and Integrate CUDA support #27 (comment) (WSL)
After you have installed CUDA, make sure the Nvidia cuda compiler nvcc is in your PATH and you can install the R package with.

Sys.setenv(WHISPER_CUBLAS = "1")
remotes::install_github("bnosac/audio.whisper", force = TRUE)

If you want to use your GPU when doing transcriptions, don't forget to set argument use_gpu otherwise your CPU will be used. E.g.

library(audio.whisper)
model <- whisper("medium", use_gpu = TRUE)

jwijffels mentioned this issue Jan 27, 2024

upgrade to 1.5.0 #24

Closed

jwijffels added a commit that referenced this issue Jan 27, 2024

#27 Pass on use_gpu in WhisperModel - default to no gpu

16571aa

jwijffels added a commit that referenced this issue Jan 27, 2024

#27 Pass on use_gpu in Whispermodel

19eadb6

jwijffels added a commit that referenced this issue Jan 27, 2024

#27 Pass on use_gpu in WhisperModel - default to no gpu (#33)

721cb5c

jwijffels closed this as completed Mar 8, 2024

Integrate CUDA support #27

Integrate CUDA support #27

Comments

jwijffels commented Jan 27, 2024 • edited Loading

jwijffels commented Jan 27, 2024 • edited Loading

jmgirard commented Feb 13, 2024

jwijffels commented Feb 18, 2024

jmgirard commented Feb 18, 2024 • edited Loading

jwijffels commented Feb 18, 2024 • edited Loading

jmgirard commented Feb 18, 2024

jwijffels commented Feb 18, 2024 • edited Loading

jmgirard commented Feb 18, 2024

jmgirard commented Feb 18, 2024

jwijffels commented Feb 18, 2024 • edited Loading

jmgirard commented Feb 18, 2024

jwijffels commented Feb 19, 2024 • edited Loading

jmgirard commented Feb 19, 2024 • edited Loading

jwijffels commented Feb 19, 2024

jmgirard commented Feb 19, 2024 • edited Loading

jwijffels commented Feb 19, 2024 • edited Loading

jwijffels commented Feb 20, 2024 • edited Loading

jwijffels commented Feb 29, 2024 • edited Loading

jwijffels commented Feb 29, 2024 • edited Loading

jwijffels commented Feb 29, 2024 • edited Loading

jwijffels commented Mar 1, 2024

jmgirard commented Mar 2, 2024

jwijffels commented Mar 2, 2024

jmgirard commented Mar 2, 2024

jwijffels commented Mar 2, 2024 • edited Loading

jmgirard commented Mar 2, 2024

jmgirard commented Mar 5, 2024 • edited Loading

jmgirard commented Mar 5, 2024 • edited Loading

jwijffels commented Mar 6, 2024

jmgirard commented Mar 6, 2024 • edited Loading

jmgirard commented Mar 6, 2024 • edited Loading

jmgirard commented Mar 6, 2024

jmgirard commented Mar 6, 2024

jwijffels commented Mar 6, 2024 • edited Loading

jmgirard commented Mar 6, 2024

jmgirard commented Mar 6, 2024 • edited Loading

jmgirard commented Mar 6, 2024 • edited Loading

jwijffels commented Mar 7, 2024

jwijffels commented Mar 8, 2024 • edited Loading

jwijffels commented Jan 27, 2024 •

edited

Loading

jwijffels commented Jan 27, 2024 •

edited

Loading

jmgirard commented Feb 18, 2024 •

edited

Loading

jwijffels commented Feb 18, 2024 •

edited

Loading

jwijffels commented Feb 18, 2024 •

edited

Loading

jwijffels commented Feb 18, 2024 •

edited

Loading

jwijffels commented Feb 19, 2024 •

edited

Loading

jmgirard commented Feb 19, 2024 •

edited

Loading

jmgirard commented Feb 19, 2024 •

edited

Loading

jwijffels commented Feb 19, 2024 •

edited

Loading

jwijffels commented Feb 20, 2024 •

edited

Loading

jwijffels commented Feb 29, 2024 •

edited

Loading

jwijffels commented Feb 29, 2024 •

edited

Loading

jwijffels commented Feb 29, 2024 •

edited

Loading

jwijffels commented Mar 2, 2024 •

edited

Loading

jmgirard commented Mar 5, 2024 •

edited

Loading

jmgirard commented Mar 5, 2024 •

edited

Loading

jmgirard commented Mar 6, 2024 •

edited

Loading

jmgirard commented Mar 6, 2024 •

edited

Loading

jwijffels commented Mar 6, 2024 •

edited

Loading

jmgirard commented Mar 6, 2024 •

edited

Loading

jmgirard commented Mar 6, 2024 •

edited

Loading

jwijffels commented Mar 8, 2024 •

edited

Loading