latest ggml version sync #20

dridri · 2024-08-10T12:11:36Z

This is an attempt to rebase on the latest commit of ggml master branch. My primary goal behind it is to add Vulkan/OpenCL support as I only have AMD GPUs.

Tested and working well on CPU, but I don't have any nVidia card around so I cannot test CUDA backend.

This needs the following PR on ggml to work : balisujohn/ggml#1

balisujohn · 2024-08-10T18:30:53Z

It'll be maybe 10 days before I can properly review this but it looks like really solid work and I will make sure it gets merged.

One thing I'd like to preemptively request is an update to the readme listing the extent of support for Vulkan/OpenCL and adding compile instructions if any special instructions are necessary for those compile processes.

Vulkan/OpenCL support would be extremely welcome but likely will require new implementations being added to ggml for some ops.

dridri · 2024-08-10T18:42:11Z

Agreed, other backends support should come in different PRs to avoid polluting this one. And will take me more time as it's the very first time I'm dealing with compute shaders and tensors.

Tell me if I'm wrong, but porting pad_reflect_1d, unfold_1d and conv_transpose_1d should be enough.

balisujohn · 2024-08-11T14:52:41Z

I expect it will be similar to the ops needed to add metal support, listed here #14. But I haven't confirmed there aren't additional ops missing vulkan/opencl implementations.

baileyheading · 2024-08-12T00:00:02Z

Vulkan can support macos quite easily right?

balisujohn

@dridri

Thanks for your hard work on this so far.

I looked at all the code and it all looks fine, barring one comment I had on CmakeLists.txt.

So I tested with intel CPU and a Nvidia 1070ti. It passes all tests of the Nvidia card, but fails the autoregressive model test on CPU. Generation quality was fine on both GPU and CPU for both short and long phrases. GPU and CPU tests compare against the same values, so this means that very likely ggml has changed such that some GPU op is now inconsistent with the equivalent CPU op.

You can uncomment the tests near the beginning of main to run the tests in CPU mode. The function print_all_tensors lets you print the first and last 3 elements of the tensor of your choice you can tag with ggml_set_name in the autoregressive graph. There are commented out instances of each function in main.cpp to give an idea out how to use them. So we need to find the point of divergence between the old and new CPU process. Help with this would be awesome but I will work on it when I can.

balisujohn · 2024-08-19T18:02:26Z

CMakeLists.txt

 set(CMAKE_EXPORT_COMPILE_COMMANDS ON)

-add_executable(tortoise main.cpp common.cpp)
+option(DEBUG "Debug mode" OFF)
+option(GGML_CUDA "cuda mode" OFF)


should there be a GGML_METAL option similarly declared here?

balisujohn · 2024-08-25T01:16:00Z

I have isolated the divergence in behavior on CPU to this OP

tortoise.cpp/main.cpp

Line 2601 in f21e5d5

cur = ggml_mul_mat(

for some reason this ggml_mul_mat op gives different outputs given the test input in the version of GGML in your commit and the version the master branch of tortoise.cpp currently points to. This could be related to a change to how GGML implements matrix multiplication. Our options are to either create different test cases for GPU and CPU, and change the CPU test cases to match the current CPU behavior,or to isolate the divergence in ggml_mul_mat behavior to vanilla GGML versions and try to get it fixed upstream. I lean somewhat towards creating separate CPU and GPU tests so we can keep development moving.

dridri · 2024-08-25T08:18:54Z

just to note, I'm also having a hard time porting it to Vulkan, after implementing the missing *_1d() functions, the output is total garbage (white noise + buzzing sound).
Don't know if it could be related

balisujohn · 2024-08-25T17:55:05Z

Definitely the first thing I'd recommend trying is seeing if any of the tests pass with the vulkan process by leaving only 1 uncommented at a time, that could help isolate the divergence in the Vulkan process.

dridri added 2 commits August 10, 2024 13:54

update to latest ggml version

937dbbd

CUBLAS -> CUDA

f21e5d5

dridri mentioned this pull request Aug 10, 2024

latest ggml version sync balisujohn/ggml#1

Open

balisujohn requested changes Aug 19, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

latest ggml version sync #20

latest ggml version sync #20

dridri commented Aug 10, 2024 •

edited

Loading

balisujohn commented Aug 10, 2024 •

edited

Loading

dridri commented Aug 10, 2024 •

edited

Loading

balisujohn commented Aug 11, 2024 •

edited

Loading

baileyheading commented Aug 12, 2024

balisujohn left a comment

balisujohn Aug 19, 2024

balisujohn commented Aug 25, 2024 •

edited

Loading

dridri commented Aug 25, 2024

balisujohn commented Aug 25, 2024 •

edited

Loading

latest ggml version sync #20

Are you sure you want to change the base?

latest ggml version sync #20

Conversation

dridri commented Aug 10, 2024 • edited Loading

balisujohn commented Aug 10, 2024 • edited Loading

dridri commented Aug 10, 2024 • edited Loading

balisujohn commented Aug 11, 2024 • edited Loading

baileyheading commented Aug 12, 2024

balisujohn left a comment

Choose a reason for hiding this comment

balisujohn Aug 19, 2024

Choose a reason for hiding this comment

balisujohn commented Aug 25, 2024 • edited Loading

dridri commented Aug 25, 2024

balisujohn commented Aug 25, 2024 • edited Loading

dridri commented Aug 10, 2024 •

edited

Loading

balisujohn commented Aug 10, 2024 •

edited

Loading

dridri commented Aug 10, 2024 •

edited

Loading

balisujohn commented Aug 11, 2024 •

edited

Loading

balisujohn commented Aug 25, 2024 •

edited

Loading

balisujohn commented Aug 25, 2024 •

edited

Loading