-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
latest ggml version sync #20
base: master
Are you sure you want to change the base?
Conversation
It'll be maybe 10 days before I can properly review this but it looks like really solid work and I will make sure it gets merged.
Vulkan/OpenCL support would be extremely welcome but likely will require new implementations being added to ggml for some ops. |
Agreed, other backends support should come in different PRs to avoid polluting this one. And will take me more time as it's the very first time I'm dealing with compute shaders and tensors. Tell me if I'm wrong, but porting |
I expect it will be similar to the ops needed to add metal support, listed here #14. But I haven't confirmed there aren't additional ops missing vulkan/opencl implementations. |
Vulkan can support macos quite easily right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your hard work on this so far.
I looked at all the code and it all looks fine, barring one comment I had on CmakeLists.txt
.
So I tested with intel CPU and a Nvidia 1070ti. It passes all tests of the Nvidia card, but fails the autoregressive model test on CPU. Generation quality was fine on both GPU and CPU for both short and long phrases. GPU and CPU tests compare against the same values, so this means that very likely ggml has changed such that some GPU op is now inconsistent with the equivalent CPU op.
You can uncomment the tests near the beginning of main to run the tests in CPU mode. The function print_all_tensors
lets you print the first and last 3 elements of the tensor of your choice you can tag with ggml_set_name
in the autoregressive graph. There are commented out instances of each function in main.cpp
to give an idea out how to use them. So we need to find the point of divergence between the old and new CPU process. Help with this would be awesome but I will work on it when I can.
set(CMAKE_EXPORT_COMPILE_COMMANDS ON) | ||
|
||
add_executable(tortoise main.cpp common.cpp) | ||
option(DEBUG "Debug mode" OFF) | ||
option(GGML_CUDA "cuda mode" OFF) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should there be a GGML_METAL
option similarly declared here?
I have isolated the divergence in behavior on CPU to this OP Line 2601 in f21e5d5
ggml_mul_mat op gives different outputs given the test input in the version of GGML in your commit and the version the master branch of tortoise.cpp currently points to. This could be related to a change to how GGML implements matrix multiplication. Our options are to either create different test cases for GPU and CPU, and change the CPU test cases to match the current CPU behavior,or to isolate the divergence in ggml_mul_mat behavior to vanilla GGML versions and try to get it fixed upstream. I lean somewhat towards creating separate CPU and GPU tests so we can keep development moving.
|
just to note, I'm also having a hard time porting it to Vulkan, after implementing the missing *_1d() functions, the output is total garbage (white noise + buzzing sound). |
Definitely the first thing I'd recommend trying is seeing if any of the tests pass with the vulkan process by leaving only 1 uncommented at a time, that could help isolate the divergence in the Vulkan process. |
This is an attempt to rebase on the latest commit of ggml master branch. My primary goal behind it is to add Vulkan/OpenCL support as I only have AMD GPUs.
Tested and working well on CPU, but I don't have any nVidia card around so I cannot test CUDA backend.
This needs the following PR on ggml to work : balisujohn/ggml#1