clip : offload to GPU #4061

ggerganov · 2023-11-13T15:02:49Z

With the recent support for running convolutions on the GPU (#4060) we should be able to offload CLIP to run fully on the GPU.

llama.cpp/examples/llava/clip.cpp

Lines 231 to 236 in 3d68f36

    
           static ggml_cgraph * clip_image_build_graph(const clip_ctx * ctx, const clip_image_f32_batch * imgs) { 
        
               if (!ctx->has_vision_encoder) { 
        
                   printf("This gguf file seems to have no vision encoder\n"); 
        
                   return nullptr; 
        
               }

Implement ggml_acc CUDA / Metal kernels
Avoid ggml_repeat where possible using broadcast
Should use the new ggml-backend API (see https://github.com/ggerganov/ggml/blob/master/examples/gpt-2/main-backend.cpp)

The text was updated successfully, but these errors were encountered:

cmp-nct · 2023-11-13T16:10:04Z

It seems minor but I believe supporting CLIP is a major step ahead, it's such a fundamental model

ggerganov · 2023-11-13T17:20:46Z

Ideally, CLIP should be supported as a separate model arch in llama.cpp, but it will take some extra work to achieve this: abetlen/llama-cpp-python#813 (comment)

We should do it at some point in the future.

monatis · 2023-11-13T19:31:27Z

Ideally, CLIP should be supported as a separate model arch in llama.cpp,

Maybe we can start with porting full text and vision encoder parts from my clip.cpp to llama.cpp/examples/llava/clip.[h/cpp] and with the community's testing and feedback support we can polish the implementation gradually. Then we can include it directly in llama.cpp as an additional arch once we are confident about its public API and functionality. Or I can continue to develop it externally in that repo and merge later. @ggerganov WDYT?

cmp-nct · 2023-11-14T16:11:21Z

I'd love to see full clip support in llama.cpp soon.
The current clip is cut down to only support what llava needed, the one of monatis contains a lot more functionality.
Imho we should aim to get the full support of features in, most important is probably llava but as standalone image analysis clip is very valuable.

FSSRepo · 2023-11-26T01:01:18Z

@ggerganov I have implemented broadcast for the ggml_add and ggml_mul operations (only for CPU and CUDA backend). I am just waiting for my pull request to be merged into stable diffusion and will then have some time to incorporate the changes I made in ggml.

ggerganov · 2023-11-26T11:28:53Z

Great! Would be great to PR them in llama.cpp (i.e. here) so that we can test CLIP performance.
I think I will be able to help with the Metal implementation

FSSRepo · 2023-11-26T11:36:04Z

Great! Would be great to PR them in llama.cpp (i.e. here) so that we can test CLIP performance. I think I will be able to help with the Metal implementation

See #4205, I think that, for now, we shouldn't merge that pull request until applying the changes I made to ggml in the main project. This way, we'll also have a more comprehensive implementation, eliminating the repeats and all that.

y10ab1 · 2023-12-08T01:38:25Z

Do we have any updates on this feature? I am eager to use it!

cmp-nct · 2023-12-08T02:43:56Z

@ggerganov @FSSRepo
Would be awesome to get this pushed into ggml and llama.cpp
Did you see my discussion on CogVLM ? #4350
It's a vision model that beats GPT4-vision and should run well on 8-9GB VRAM quantized, it's the first time I have seen anything beating Open-AI. We definitely will need full CLIP offload but the main obstacle is that it has an additional architecture (more than just the 2 layers of projection layer of llava) that connects Vicuna-7 with Big-ViT.
I know it's a bit off topic here, just pushing that because I think it's so significant and totally overlooked

FSSRepo · 2023-12-11T00:25:52Z

@cmp-nct It seems that the architecture of vision model CLIP and Llama differs from the implementation here. The truth is that there will be a lot of work to do if we want to have it here.

cmp-nct · 2023-12-11T02:24:01Z

You are certainly right on the work required, it's likely about as much as the entire clip.cpp has been.

At this point it's the best thing we have in open source for vision, it's right at eye level to GPT4-Vision.
That's the first time I have seen anything open (or closed) to really compete with the best OpenAI has to offer.

For "simple vision" llava-1.5 (ShareGPT4V atm) is working great with clip.cpp
If we want real good vision with good OCR then CogVLM would be the current choice.

The only high level alternative is QwenVL which is significantly worse than CogVLM and about the same work to integrate here.

ggerganov · 2023-12-30T21:25:46Z

Done via #4205 and #4696

ggerganov added good first issue Good for newcomers performance Speed related topics labels Nov 13, 2023

ggerganov closed this as completed Dec 30, 2023

rezacopol mentioned this issue Apr 24, 2024

[Performance] Llava-cli offloading image encoding to cuda #6883

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clip : offload to GPU #4061

clip : offload to GPU #4061

ggerganov commented Nov 13, 2023

cmp-nct commented Nov 13, 2023

ggerganov commented Nov 13, 2023

monatis commented Nov 13, 2023

cmp-nct commented Nov 14, 2023

FSSRepo commented Nov 26, 2023 •

edited

Loading

ggerganov commented Nov 26, 2023

FSSRepo commented Nov 26, 2023

y10ab1 commented Dec 8, 2023

cmp-nct commented Dec 8, 2023 •

edited

Loading

FSSRepo commented Dec 11, 2023 •

edited

Loading

cmp-nct commented Dec 11, 2023

ggerganov commented Dec 30, 2023

clip : offload to GPU #4061

clip : offload to GPU #4061

Comments

ggerganov commented Nov 13, 2023

cmp-nct commented Nov 13, 2023

ggerganov commented Nov 13, 2023

monatis commented Nov 13, 2023

cmp-nct commented Nov 14, 2023

FSSRepo commented Nov 26, 2023 • edited Loading

ggerganov commented Nov 26, 2023

FSSRepo commented Nov 26, 2023

y10ab1 commented Dec 8, 2023

cmp-nct commented Dec 8, 2023 • edited Loading

FSSRepo commented Dec 11, 2023 • edited Loading

cmp-nct commented Dec 11, 2023

ggerganov commented Dec 30, 2023

FSSRepo commented Nov 26, 2023 •

edited

Loading

cmp-nct commented Dec 8, 2023 •

edited

Loading

FSSRepo commented Dec 11, 2023 •

edited

Loading