Can madlad400 gguf models from huggingface be used? #8300
-
I compiled the latest version which has T5 support and tried running a madlad400 model from https://huggingface.co/jbochi/madlad400-3b-mt/resolve/main/model-q4k.gguf
Is there a change in the conversion process from .safetensors which is needed for T5 models?
|
Beta Was this translation helpful? Give feedback.
Replies: 7 comments 4 replies
-
Well, I couldn't figure out a way to use jbochi's gguf directly either. So I think it's necessary to use the conversion script convert_hf_to_gguf.py. Btw super enthused about these recent additions, this project just keeps on getting better :D EDIT: OK so this behaves somewhat similarly to candle, but the glitches are slightly different. |
Beta Was this translation helpful? Give feedback.
-
Okay I got it working now (I think), and man it feels FAST!! The bad news is that my GGUF conversion procedure from jbochi => llama.cpp was quite a messy business indeed. I can give more details if there's interest but somehow I feel there must be a better way :D EDIT: I've now managed to polish the conversion process a little bit, so that no llama.cpp customization is necessary any longer. Here's the patch if anyone wants to try this version. You'll need the original jbochi model and xdelta3. |
Beta Was this translation helpful? Give feedback.
-
Did anyone made a llama.cpp compatible gguf for another T5 model, aya-101: https://huggingface.co/CohereForAI/aya-101 ? |
Beta Was this translation helpful? Give feedback.
-
Oh wow, there they are, popping up at HF now: |
Beta Was this translation helpful? Give feedback.
-
OK just got aya-101 working! The catch is that you have to quantize it yourself. I wanted to test quantizing a large model with meager resources and this was as good a candidate as any. (Of course, "large" is relative...in the era of 405B this is peanuts really :)
|
Beta Was this translation helpful? Give feedback.
-
aya-101 is missing the spiece.model file which is needed to convert it. I copied the one from mt5-xxl which enabled the conversion to work and created IQ4_XS quant. bash-5.1$ lm "translate to finnish: I wanted to test quantizing a large model with meager resources and this was as good a candidate as any." The model is pretty dumb, looks mainly useful for translations: lm "Answer the following yes/no question by reasoning step-by-step. Could a dandelion suffer from hepatitis?" Translated question with madlad400 to German, same answer. bash-5.1$ lm "Beantworten Sie die folgende Ja/Nein-Frage schrittweise: Könnte ein Löwenzahn an Hepatitis leiden?" |
Beta Was this translation helpful? Give feedback.
-
Gotta agree on dumb :D IQ4_XS, you say? I wonder how that imatrix thing is handled in these multilingual models. Btw in case anyone's wondering, yes you can run this on said C2D/4GB machine. Well, it's more of a crawl though. vvv Thanks vvv --repeat-penalty 2.0 and leveling up to IQ4_XS mitigated the looping problem, but not all the way. |
Beta Was this translation helpful? Give feedback.
Okay I got it working now (I think), and man it feels FAST!!
The bad news is that my GGUF conversion procedure from jbochi => llama.cpp was quite a messy business indeed.
It involved conjuring up an empty GGUF, filling it with metadata and doing some frankensteining with KerfuffleV2's gguf-tools.
I also wrote a custom script to rename the tensors, and llama.cpp itself needed a teeny weeny change too.
The upside of this method is that the quantized tensors remain untouched.
I can give more details if there's interest but somehow I feel there must be a better way :D
EDIT: I've now managed to polish the conversion process a little bit, so that no llama.cpp customization is necessary any longer.