-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate CUDA support #27
Comments
We are now at the state where the following should at least work once we integrate https://github.com/ggerganov/whisper.cpp/blob/master/Makefile#L210-L222 in Makevars and compile the cuda parts
|
Just a heads up that I have access to a Windows machine with CUDA in case it would be helpful for me to do tests and benchmarks like I did for MacOS. |
That would indeed be great. |
OS Name: Microsoft Windows 10 Enterprise Trace> remotes::install_github("bnosac/audio.whisper", force = TRUE)
The downloaded binary packages are in
C:\Users\j553g371\AppData\Local\Temp\RtmpIhem3s\downloaded_packages
Running `R CMD build`...
* checking for file 'C:\Users\j553g371\AppData\Local\Temp\RtmpIhem3s\remotes3074262171b7\bnosac-audio.whisper-ecdb06d/DESCRIPTION' ... OK
* preparing 'audio.whisper':
* checking DESCRIPTION meta-information ... OK
* cleaning src
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building 'audio.whisper_0.3.2.tar.gz'
* installing *source* package 'audio.whisper' ...
** using staged installation
** libs
using C++ compiler: 'G__~1.EXE (GCC) 12.3.0'
using C++11
I whisper.cpp build info:
I UNAME_S: MSYS_NT-10.0-19044
I UNAME_P: unknown
I UNAME_M: x86_64
I PKG_CFLAGS: -mavx -mf16c -msse3 -mssse3 -D_XOPEN_SOURCE=600
I PKG_CPPFLAGS: -mavx -mf16c -msse3 -mssse3 -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600
I PKG_LIBS:
gcc -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -mavx -mf16c -msse3 -mssse3 -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include' -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include" -mavx -mf16c -msse3 -mssse3 -D_XOPEN_SOURCE=600 -O2 -Wall -mfpmath=sse -msse2 -mstackrealign -c whisper_cpp/ggml-quants.c -o whisper_cpp/ggml-quants.o
whisper_cpp/ggml-quants.c:1337:14: warning: 'make_qkx1_quants' defined but not used [-Wunused-function]
1337 | static float make_qkx1_quants(int n, int nmax, const float * restrict x, uint8_t * restrict L, float * restrict the_min,
| ^~~~~~~~~~~~~~~~
gcc -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -mavx -mf16c -msse3 -mssse3 -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include' -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include" -mavx -mf16c -msse3 -mssse3 -D_XOPEN_SOURCE=600 -O2 -Wall -mfpmath=sse -msse2 -mstackrealign -c whisper_cpp/ggml-backend.c -o whisper_cpp/ggml-backend.o
whisper_cpp/ggml-backend.c:841:13: warning: 'sched_print_assignments' defined but not used [-Wunused-function]
841 | static void sched_print_assignments(ggml_backend_sched_t sched, struct ggml_cgraph * graph) {
| ^~~~~~~~~~~~~~~~~~~~~~~
gcc -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -mavx -mf16c -msse3 -mssse3 -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include' -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include" -mavx -mf16c -msse3 -mssse3 -D_XOPEN_SOURCE=600 -O2 -Wall -mfpmath=sse -msse2 -mstackrealign -c whisper_cpp/ggml-alloc.c -o whisper_cpp/ggml-alloc.o
gcc -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -mavx -mf16c -msse3 -mssse3 -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include' -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include" -mavx -mf16c -msse3 -mssse3 -D_XOPEN_SOURCE=600 -O2 -Wall -mfpmath=sse -msse2 -mstackrealign -c whisper_cpp/ggml.c -o whisper_cpp/ggml.o
whisper_cpp/ggml.c:17593:13: warning: 'ggml_opt_get_grad' defined but not used [-Wunused-function]
17593 | static void ggml_opt_get_grad(int np, struct ggml_tensor * const ps[], float * g) {
| ^~~~~~~~~~~~~~~~~
g++ -std=gnu++11 -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -mavx -mf16c -msse3 -mssse3 -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include' -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -mfpmath=sse -msse2 -mstackrealign -c whisper_cpp/whisper.cpp -o whisper_cpp/whisper.o
whisper_cpp/whisper.cpp:203:29: warning: 'ggml_tensor* ggml_mul_mat_pad(ggml_context*, ggml_tensor*, ggml_tensor*, int)' defined but not used [-Wunused-function]
203 | static struct ggml_tensor * ggml_mul_mat_pad(struct ggml_context * ctx, struct ggml_tensor * x, struct ggml_tensor * y, int pad = 32) {
| ^~~~~~~~~~~~~~~~
g++ -std=gnu++11 -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -mavx -mf16c -msse3 -mssse3 -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include' -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -mfpmath=sse -msse2 -mstackrealign -c whisper_cpp/common-ggml.cpp -o whisper_cpp/common-ggml.o
g++ -std=gnu++11 -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -mavx -mf16c -msse3 -mssse3 -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include' -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -mfpmath=sse -msse2 -mstackrealign -c whisper_cpp/common.cpp -o whisper_cpp/common.o
g++ -std=gnu++11 -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -mavx -mf16c -msse3 -mssse3 -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include' -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -mfpmath=sse -msse2 -mstackrealign -c rcpp_whisper.cpp -o rcpp_whisper.o
g++ -std=gnu++11 -I"C:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/include" -DNDEBUG -mavx -mf16c -msse3 -mssse3 -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -I'C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/Rcpp/include' -I"C:/rtools43/x86_64-w64-mingw32.static.posix/include" -O2 -Wall -mfpmath=sse -msse2 -mstackrealign -c RcppExports.cpp -o RcppExports.o
g++ -shared -s -static-libgcc -o audio.whisper.dll tmp.def whisper_cpp/ggml-quants.o whisper_cpp/ggml-backend.o whisper_cpp/ggml-alloc.o whisper_cpp/ggml.o whisper_cpp/whisper.o whisper_cpp/common-ggml.o whisper_cpp/common.o rcpp_whisper.o RcppExports.o -LC:/rtools43/x86_64-w64-mingw32.static.posix/lib/x64 -LC:/rtools43/x86_64-w64-mingw32.static.posix/lib -LC:/Users/j553g371/AppData/Local/Programs/R/R-43~1.2/bin/x64 -lR
installing to C:/Users/j553g371/AppData/Local/Programs/R/R-4.3.2/library/00LOCK-audio.whisper/00new/audio.whisper/libs/x64
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (audio.whisper) Use GPU> library(audio.whisper)
> model <- whisper("medium", use_gpu = TRUE)
trying URL 'https://huggingface.co/ggerganov/whisper.cpp/resolve/d15393806e24a74f60827e23e986f0c10750b358/ggml-medium.bin'
Content type 'application/octet-stream' length 1533763059 bytes (1462.7 MB)
downloaded 1462.7 MB
Downloading finished, model stored at 'C:/Users/j553g371/Documents/ggml-medium.bin'
whisper_init_from_file_with_params_no_state: loading model from 'C:/Users/j553g371/Documents/ggml-medium.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1024
whisper_model_load: n_text_head = 16
whisper_model_load: n_text_layer = 24
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 4 (medium)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs = 99
whisper_model_load: CPU buffer size = 1533.52 MB
whisper_model_load: model size = 1533.14 MB
whisper_init_state: kv self size = 132.12 MB
whisper_init_state: kv cross size = 147.46 MB
whisper_init_state: compute buffer (conv) = 25.61 MB
whisper_init_state: compute buffer (encode) = 170.28 MB
whisper_init_state: compute buffer (cross) = 7.85 MB
whisper_init_state: compute buffer (decode) = 98.32 MB |
What does |
|
Can you show all files which are (recursively) at /c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.2. Something like Does |
See attached for full list of files
|
Note as well that I have system variables for |
Thanks. With that information, I think I can add something to the Makevars to compile the cuda source code. |
There's nothing there with cuda, nvidia, or nv. |
I've set up continuous integration with CUDA in the
On Windows it looks like the nvcc executable from CUDA needs I've now let it create a fat binary for all the gpu architectures which nvcc lists: |
Error when trying to install without visual studio Trace
Trying to install MSVC now... |
Yes, on Windows you need |
Trace
|
Yep, got similar errors in run https://github.com/bnosac/audio.whisper/actions/runs/7965676963/job/21745662097 Trace for cuda 12.2.0
Trace for cuda 11.8.0
These R_ext/Complex.h for cuda 11.8.0 are because I made sure printing goes to the R console instead of std::err
|
When disabling printing as indicated at #27 (comment)
|
Notes when spinning up a p3.2xlarge (Tesla V100, 16GB GPU RAM - NVIDIA Corporation GV100GL [Tesla V100 SXM2 16GB]) on AWS.
Don't forgot to set Note to future self: installed cuda and nvidia drivers as follows on Ubuntu 22.04
|
quick check to see if whisper.cpp provides the same information
Same timing with whisper.cpp
|
So conclusion: Tesla V100, 16GB GPU RAM.
So CUDA with R works on this Linux machine |
@jmgirard I think I'll include already the changes that allow to install and transcribe on Linux with CUDA as that works. Will try later to see if we can make it work on Windows. But apparently it needs to link to culibos and rt (see #27 (comment)) and I don't know if that is uberhaupt available when you install the CUDA drivers on Windows. At least it is not available on the continuous integration run under C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8/lib/x64 (maybe it is at your machine?) |
> list.files("C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v12.3/lib/x64")
[1] "cublas.lib" "cublasLt.lib"
[3] "cuda.lib" "cudadevrt.lib"
[5] "cudart.lib" "cudart_static.lib"
[7] "cufft.lib" "cufftw.lib"
[9] "cufilt.lib" "curand.lib"
[11] "cusolver.lib" "cusolverMg.lib"
[13] "cusparse.lib" "nppc.lib"
[15] "nppial.lib" "nppicc.lib"
[17] "nppidei.lib" "nppif.lib"
[19] "nppig.lib" "nppim.lib"
[21] "nppist.lib" "nppisu.lib"
[23] "nppitc.lib" "npps.lib"
[25] "nvblas.lib" "nvJitLink.lib"
[27] "nvJitLink_static.lib" "nvjpeg.lib"
[29] "nvml.lib" "nvptxcompiler_static.lib"
[31] "nvrtc-builtins_static.lib" "nvrtc.lib"
[33] "nvrtc_static.lib" "OpenCL.lib" |
I've enabled CUDA integration on the master branch for Linux. |
I wonder if this would work on Windows via WSL2? |
culibos and rt is clearly not on your machine The relevant part of the compilation is here: https://github.com/bnosac/audio.whisper/blob/master/src/Makevars#L152-L161 I looked to the latest changes in the Makevars on whisper.cpp and they link to -L/usr/lib/wsl/lib, I've added that on the master branch. Maybe that allows to run it at WSL2. Would be cool if you could test that. I've added the installations which I did on that Tesla V100 machine on AWS at the end of #27 (comment) |
I'm not sure I have admin permissions required to enable WSL on my work computer (unfortunately my home PC does not have an RTX) but I will try and let you know. |
Ok, I was able to get WSL going on my work computer. I installed Ubuntu Jammy Jellyfish.
-- https://docs.nvidia.com/cuda/wsl-user-guide/index.html |
Now to try to install audio.whisper:
Trace:
Not sure why it isn't finding
|
Brave 👍 going down the rabbit hole of installing NVIDIA drivers and CUDA.
|
I guess it's not really in path: > Sys.getenv("PATH")
[1] "/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:/usr/lib/rstudio-server/bin/quarto/bin:/usr/lib/rstudio-server/bin/postback:/usr/lib/rstudio-server/bin/postback" Here is a successful (!) trace: > Sys.setenv(PATH = sprintf("%s:/usr/local/cuda-12.4/bin", Sys.getenv("PATH")))
> Sys.setenv(CUDA_PATH = "/usr/local/cuda-12.4")
> Sys.setenv(WHISPER_CUBLAS = "1")
> remotes::install_github("bnosac/audio.whisper", ref = "0.3.2", force = TRUE)
Downloading GitHub repo bnosac/audio.whisper@0.3.2
── R CMD build ────────────────────────────────────────────────────────────────────────────────────────────────────────
✔ checking for file ‘/tmp/RtmpEPMVPm/remotes1f36f9bca10/bnosac-audio.whisper-8d57d02/DESCRIPTION’ ...
─ preparing ‘audio.whisper’:
✔ checking DESCRIPTION meta-information ...
─ cleaning src
─ checking for LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ building ‘audio.whisper_0.3.2.tar.gz’
Installing package into ‘/home/jmgirard/R/x86_64-pc-linux-gnu-library/4.3’
(as ‘lib’ is unspecified)
* installing *source* package ‘audio.whisper’ ...
** using staged installation
** libs
using C++ compiler: ‘g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0’
using C++11
I whisper.cpp build info:
I UNAME_S: Linux
I UNAME_P: x86_64
I UNAME_M: x86_64
I PKG_CFLAGS: -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread
I PKG_CPPFLAGS: -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread
I PKG_LIBS: -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L"/usr/local/cuda-12.4/lib64" -L/opt/cuda/lib64 -L"/usr/local/cuda-12.4/targets/x86_64-linux/lib" -L/usr/lib/wsl/lib
gcc -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include' -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fpic -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c whisper_cpp/ggml-quants.c -o whisper_cpp/ggml-quants.o
gcc -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include' -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fpic -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c whisper_cpp/ggml-backend.c -o whisper_cpp/ggml-backend.o
gcc -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include' -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fpic -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c whisper_cpp/ggml-alloc.c -o whisper_cpp/ggml-alloc.o
gcc -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include' -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -fpic -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c whisper_cpp/ggml.c -o whisper_cpp/ggml.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include' -fpic -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c whisper_cpp/whisper.cpp -o whisper_cpp/whisper.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include' -fpic -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c whisper_cpp/common-ggml.cpp -o whisper_cpp/common-ggml.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include' -fpic -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c whisper_cpp/common.cpp -o whisper_cpp/common.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include' -fpic -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c rcpp_whisper.cpp -o rcpp_whisper.o
g++ -std=gnu++11 -I"/usr/share/R/include" -DNDEBUG -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I'/usr/lib/R/site-library/Rcpp/include' -fpic -g -O2 -ffile-prefix-map=/build/r-base-14Q6vq/r-base-4.3.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -c RcppExports.cpp -o RcppExports.o
nvcc --forward-unknown-to-host-compiler -arch=native -O3 -mavx -mf16c -msse3 -mssse3 -DGGML_USE_CUBLAS -I"/usr/local/cuda-12.4/include" -I"/usr/local/cuda-12.4/targets/x86_64-linux/include" -DSTRICT_R_HEADERS -I./dr_libs -I./whisper_cpp -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -pthread -I"/usr/share/R/include" -fPIC -c whisper_cpp/ggml-cuda.cu -o whisper_cpp/ggml-cuda.o
g++ -std=gnu++11 -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro -o audio.whisper.so whisper_cpp/ggml-quants.o whisper_cpp/ggml-backend.o whisper_cpp/ggml-alloc.o whisper_cpp/ggml.o whisper_cpp/whisper.o whisper_cpp/common-ggml.o whisper_cpp/common.o rcpp_whisper.o RcppExports.o whisper_cpp/ggml-cuda.o -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda-12.4/lib64 -L/opt/cuda/lib64 -L/usr/local/cuda-12.4/targets/x86_64-linux/lib -L/usr/lib/wsl/lib -L/usr/lib/R/lib -lR
installing to /home/jmgirard/R/x86_64-pc-linux-gnu-library/4.3/00LOCK-audio.whisper/00new/audio.whisper/libs
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (audio.whisper) |
> Sys.setenv(PATH = sprintf("%s:/usr/local/cuda-12.4/bin", Sys.getenv("PATH")))
> Sys.setenv(CUDA_PATH = "/usr/local/cuda-12.4")
> Sys.setenv(WHISPER_CUBLAS = "1")
> remotes::install_github("bnosac/audio.whisper", ref = "0.3.2", force = TRUE)
>
> library(av)
> download.file(url = "https://www.ubu.com/media/sound/dec_francis/Dec-Francis-E_rant1.mp3", destfile = "rant1.mp3", mode = "wb")
> av_audio_convert("rant1.mp3", output = "output.wav", format = "wav", sample_rate = 16000)
>
> library(audio.whisper)
> model <- whisper("medium", use_gpu = TRUE)
> trans <- predict(model, newdata = "output.wav", language = "en", n_threads = 1)
> trans$timing
$transcription_start
[1] "2024-03-06 09:08:29 CST"
$transcription_end
[1] "2024-03-06 09:09:12 CST"
$transcription_duration
Time difference of 0.7129008 mins |
0.71 min (Use GPU = TRUE, n_threads = 1) |
For large-v3 with CUDA, I complete the above 0.94 min. But for large-v3-q5_0 with CUDA, I get 34.96 min. I thought the point of quantized models was to be faster/more efficient? Do I need to install something else (e.g., ONNX) to unlock this benefit of quantized models? |
Good to see that CUDA works on WSL as well and thanks for listing up the steps exactly what you did. I've asksed at whisper.cpp to see if I could directly compile it alongside the Rtools toolchain (see ggerganov/whisper.cpp#1922) but I think currently WSL seems to be the only way unless somehow we use cmake in the build process and rely on another compiler than R's default on Windows. While you are timing things as well.
|
I can respond to the rest later today, but here is a quick answer:
|
> library(audio.whisper)
> model <- whisper("medium", use_gpu = TRUE)
whisper_init_from_file_with_params_no_state: loading model from '/home/jmgirard/ggml-medium.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 1024
whisper_model_load: n_text_head = 16
whisper_model_load: n_text_layer = 24
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 4 (medium)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs = 99
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 3050, compute capability 8.6, VMM: yes
whisper_backend_init: using CUDA backend
whisper_model_load: CUDA buffer size = 1533.52 MB
whisper_model_load: model size = 1533.14 MB
whisper_backend_init: using CUDA backend
whisper_init_state: kv self size = 132.12 MB
whisper_init_state: kv cross size = 147.46 MB
whisper_init_state: compute buffer (conv) = 25.61 MB
whisper_init_state: compute buffer (encode) = 170.28 MB
whisper_init_state: compute buffer (cross) = 7.85 MB
whisper_init_state: compute buffer (decode) = 98.32 MB
> download.file("https://github.com/jwijffels/example/raw/main/example.wav", "example.wav")
trying URL 'https://github.com/jwijffels/example/raw/main/example.wav'
Content type 'application/octet-stream' length 9605198 bytes (9.2 MB)
==================================================
downloaded 9.2 MB
> trans <- predict(model, newdata = "example.wav", language = "en", n_threads = 4)
system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | METAL = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | CUDA = 1 | COREML = 0 | OPENVINO = 0 |
Processing example.wav (4802560 samples, 300.16 sec), lang = en, translate = 0, timestamps = 0, beam_size = -1, best_of = 5
> trans$timing
$transcription_start
[1] "2024-03-06 14:56:13 CST"
$transcription_end
[1] "2024-03-06 14:56:34 CST"
$transcription_duration
Time difference of 0.3399141 mins |
> trans <- predict(
model,
newdata = "example.wav",
language = "en",
n_threads = 4,
n_processors = 4
)
Error: C stack usage 746289549788 is too close to the limit |
yes, probably that should be |
Closing as CUDA integration is enabled on the master branch since audio.whisper version 0.3.2 and works for Linux and Windows Subsystem for Linux. As a wrapup:
If you want to use your GPU when doing transcriptions, don't forget to set argument use_gpu otherwise your CPU will be used. E.g.
|
TODO
Next integrate https://github.com/ggerganov/whisper.cpp/blob/master/Makefile#L210-L222 in Makevars
The text was updated successfully, but these errors were encountered: