-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
convert.py xgen support #2053
convert.py xgen support #2053
Conversation
currently erroring out here:
if I neuter diff --git a/convert.py b/convert.py
index c593abe..e7e7654 100644
--- a/convert.py
+++ b/convert.py
@@ -969,6 +969,7 @@ def bounded_parallel_map(func: Callable[[In], Out], iterable: Iterable[In], conc
def check_vocab_size(params: Params, vocab: Vocab) -> None:
+ return
if params.n_vocab != vocab.vocab_size:
# GGMLVocab comes from the same file as the model so shouldn't mismatch:
assert isinstance(vocab, SentencePieceVocab) or isinstance(vocab, XgenVocab) |
I've tried to use this PR to convert xgen-4k to GGML. It worked during the conversion but failed on loading it to inference. Here is the error message:
|
@clyang that was with the patch to I wonder instead of |
I implemented this and I was able to use the model with
|
What is needed to finalize this? |
this seems to work and could be merged, but i'm not sure about correctness. there may be tokens missing, or a better placeholder token value (vs also i haven't had a chance to look, but on the |
token = self.xt._convert_id_to_token(index) | ||
yield (token, float(index)) | ||
for index in range(self.vocab_size_base, self.vocab_size): | ||
yield (b'', float(index)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yield (b'', float(index)) | |
yield (b'<|unk|>', float(index)) |
🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried the latest commit and this is what I got.
$ python3 convert.py models/xgen-4k-7b-orig --outtype f16
Loading model file models/xgen-4k-7b-orig/pytorch_model-00001-of-00003.bin
Loading model file models/xgen-4k-7b-orig/pytorch_model-00001-of-00003.bin
Loading model file models/xgen-4k-7b-orig/pytorch_model-00002-of-00003.bin
Loading model file models/xgen-4k-7b-orig/pytorch_model-00003-of-00003.bin
params: n_vocab:51200 n_embd:4096 n_mult:256 n_head:32 n_layer:32
Writing vocab...
Traceback (most recent call last):
File "/home/user/llama.cpp/convert.py", line 1255, in <module>
main()
File "/home/user/llama.cpp/convert.py", line 1250, in main
OutputFile.write_all(outfile, params, output_type, model, vocab)
File "/home/user/llama.cpp/convert.py", line 1041, in write_all
of.write_vocab(vocab)
File "/home/user/llama.cpp/convert.py", line 1022, in write_vocab
self.fout.write(text)
TypeError: a bytes-like object is required, not 'str'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@clyang Try changing line 220, convert.py to the following:
token = self.xt.encoder.decode_single_token_bytes(index)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm convert.py and quantize both work, thanks @smdesai !!
Btw, you still need to change EOS, BOS and NL token ID in llama.cpp to make it inference correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you share what you changed in llama.cpp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tmm1 Here is what I modified in llama.cpp:
llama_token llama_token_bos() {
return 50256;
}
llama_token llama_token_eos() {
return 50256;
}
llama_token llama_token_nl() {
return 198;
}
Hi, following this thread, I tried to convert to q4_0 or q4_1 but seem to be facing problems. ubuntu@ip-172-31-41-82:~/squeeze-llm/llama.cpp$ python3 convert.py models/xgen-7b-8k-base --outtype q4_1
Loading model file models/xgen-7b-8k-base/pytorch_model-00001-of-00003.bin
Loading model file models/xgen-7b-8k-base/pytorch_model-00001-of-00003.bin
Loading model file models/xgen-7b-8k-base/pytorch_model-00002-of-00003.bin
Loading model file models/xgen-7b-8k-base/pytorch_model-00003-of-00003.bin
params: n_vocab:51200 n_embd:4096 n_mult:256 n_head:32 n_layer:32
Traceback (most recent call last):
File "/home/ubuntu/squeeze-llm/llama.cpp/convert.py", line 1255, in <module>
main()
File "/home/ubuntu/squeeze-llm/llama.cpp/convert.py", line 1248, in main
model = convert_to_output_type(model, output_type)
File "/home/ubuntu/squeeze-llm/llama.cpp/convert.py", line 1086, in convert_to_output_type
return {name: tensor.astype(output_type.type_for_tensor(name, tensor))
File "/home/ubuntu/squeeze-llm/llama.cpp/convert.py", line 1086, in <dictcomp>
return {name: tensor.astype(output_type.type_for_tensor(name, tensor))
File "/home/ubuntu/squeeze-llm/llama.cpp/convert.py", line 575, in astype
self.validate_conversion_to(data_type)
File "/home/ubuntu/squeeze-llm/llama.cpp/convert.py", line 586, in validate_conversion_to
raise Exception(f"Can't turn an unquantized tensor into a quantized type ({data_type})")
Exception: Can't turn an unquantized tensor into a quantized type (QuantizedDataType(groupsize=32, have_addends=True, have_g_idx=False))
|
Maybe we should finalize this and merge it? |
The code base has changed substantially since the initial PR. I can confirm with the code as of today, the Xgen model does convert and run f32, Q4_1 and Q4_0. In addition, llama.cpp also required changes to work with the Xgen model. I'm not sure the best way to proceed with these changes and for now have put those inside an ifdef/ifndef XGEN As for the best way to provide the changes, I'm not sure whether a separate PR would be better, to keep things local, here are the changes as patch files. Of course the Makefile would require a -DXGEN if compiling for the model. |
Obsolete PR? |
working toward converting XGen-7B to ggml