Update and fix Vulkan soft_max and argsort implementations #7237

0cc4m · 2024-05-12T07:00:10Z

I updated Vulkan for the changes in #7192 and fixed a bug in the soft_max implementation. That allowed me to clean up some code that was only needed for the three input tensor soft_max op.

I also updated and fixed the argsort implementation. Now test-backend-ops fully passes for the Vulkan backend.

Adriankhl · 2024-05-12T13:49:11Z

Not sure if this is the right place to discuss, I am digging into the issue #7130

Here is the root cause:

Embedding computation always try to first allocate buffer with 0 size.

Because of size += TENSOR_ALIGNMENT, size is always bigger than 0 for cpu backend (not sure if this is the correct behaviour though). So cpu backend can always allocate a buffer successsfully.

llama.cpp/ggml-backend.c

Lines 625 to 631 in b228aba

    
           GGML_CALL static ggml_backend_buffer_t ggml_backend_cpu_buffer_type_alloc_buffer(ggml_backend_buffer_type_t buft, size_t size) { 
        
               size += TENSOR_ALIGNMENT;   // malloc may return an address that is not aligned 
        
               void * data = malloc(size); // TODO: use GGML_ALIGNED_MALLOC (move to ggml-impl.h) 
        
               if (data == NULL) { 
        
                   fprintf(stderr, "%s: failed to allocate buffer of size %zu\n", __func__, size); 
        
                   return NULL; 
        
               }

For vulkan backend, ptr is still nullptr here after ggml_vk_host_malloc if size is 0.

llama.cpp/ggml-vulkan.cpp

Lines 6031 to 6043 in b228aba

    
           GGML_CALL static ggml_backend_buffer_t ggml_backend_vk_host_buffer_type_alloc_buffer(ggml_backend_buffer_type_t buft, size_t size) { 
        
           #ifdef GGML_VULKAN_DEBUG 
        
               std::cerr << "ggml_backend_vk_host_buffer_type_alloc_buffer(" << size << ")" << std::endl; 
        
           #endif 
        
               void * ptr = nullptr; 
        
               try { 
        
                   ptr = ggml_vk_host_malloc(&vk_instance.contexts[0], size); 
        
               } catch (vk::SystemError& e) { 
        
                   std::cerr << "ggml_vulkan: Failed to allocate pinned memory." << std::endl; 
        
                   std::cerr << "ggml_vulkan: " << e.what() << std::endl; 
        
                   // fallback to cpu buffer 
        
                   return ggml_backend_buft_alloc_buffer(ggml_backend_cpu_buffer_type(), size); 
        
               }

And because ggml_vk_host_malloc runs successfully, it doesn't throw an exception, which causes problem later on.

Should there be a null check here to throw an exception? Falling back to CPU buffer actually works despite the warning.

ggerganov

Might be a good idea before merging to run the 2 tests from the #7192 and verify that the output is reasonable

Adriankhl · 2024-05-13T02:55:02Z

Not sure if this is the right place to discuss, I am digging into the issue #7130

Here is the root cause:

Embedding computation always try to first allocate buffer with 0 size.

Because of size += TENSOR_ALIGNMENT, size is always bigger than 0 for cpu backend (not sure if this is the correct behaviour though). So cpu backend can always allocate a buffer successsfully.

llama.cpp/ggml-backend.c

Lines 625 to 631 in b228aba

GGML_CALL static ggml_backend_buffer_t ggml_backend_cpu_buffer_type_alloc_buffer(ggml_backend_buffer_type_t buft, size_t size) {

size += TENSOR_ALIGNMENT; // malloc may return an address that is not aligned

void * data = malloc(size); // TODO: use GGML_ALIGNED_MALLOC (move to ggml-impl.h)

if (data == NULL) {

fprintf(stderr, "%s: failed to allocate buffer of size %zu\n", __func__, size);

return NULL;

}

For vulkan backend, ptr is still nullptr here after ggml_vk_host_malloc if size is 0.

llama.cpp/ggml-vulkan.cpp

Lines 6031 to 6043 in b228aba

GGML_CALL static ggml_backend_buffer_t ggml_backend_vk_host_buffer_type_alloc_buffer(ggml_backend_buffer_type_t buft, size_t size) {

#ifdef GGML_VULKAN_DEBUG

std::cerr << "ggml_backend_vk_host_buffer_type_alloc_buffer(" << size << ")" << std::endl;

#endif

void * ptr = nullptr;

try {

ptr = ggml_vk_host_malloc(&vk_instance.contexts[0], size);

} catch (vk::SystemError& e) {

std::cerr << "ggml_vulkan: Failed to allocate pinned memory." << std::endl;

std::cerr << "ggml_vulkan: " << e.what() << std::endl;

// fallback to cpu buffer

return ggml_backend_buft_alloc_buffer(ggml_backend_cpu_buffer_type(), size);

}

And because ggml_vk_host_malloc runs successfully, it doesn't throw an exception, which causes problem later on.

Should there be a null check here to throw an exception? Falling back to CPU buffer actually works despite the warning.

Nevermind, the issue is much deeper than this. Please ignore it here

0cc4m added 2 commits May 12, 2024 08:52

Update and fix Vulkan softmax implementation

720f132

Update and fix Vulkan argsort implementation

9c5d3fc

0cc4m requested a review from ggerganov May 12, 2024 07:00

mofosyne added Vulkan Issues specific to the Vulkan backend Review Complexity : High Generally require indepth knowledge of LLMs or GPUs labels May 12, 2024

mofosyne added the bugfix fixes an issue or bug label May 12, 2024

ggerganov approved these changes May 12, 2024

View reviewed changes

ggerganov mentioned this pull request May 12, 2024

Vulkan outputs gibberish using extended context with vram saturated #7240

Closed

0cc4m merged commit c1b295e into master May 18, 2024
60 checks passed

0cc4m deleted the 0cc4m/soft-max-fix branch May 18, 2024 06:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update and fix Vulkan soft_max and argsort implementations #7237

Update and fix Vulkan soft_max and argsort implementations #7237

0cc4m commented May 12, 2024

Adriankhl commented May 12, 2024 •

edited

Loading

ggerganov left a comment

Adriankhl commented May 13, 2024

Update and fix Vulkan soft_max and argsort implementations #7237

Update and fix Vulkan soft_max and argsort implementations #7237

Conversation

0cc4m commented May 12, 2024

Adriankhl commented May 12, 2024 • edited Loading

ggerganov left a comment

Choose a reason for hiding this comment

Adriankhl commented May 13, 2024

Adriankhl commented May 12, 2024 •

edited

Loading