convert.py xgen support #2053

tmm1 · 2023-06-30T02:34:58Z

working toward converting XGen-7B to ggml

tmm1 · 2023-06-30T02:35:53Z

currently erroring out here:

  File "llama.cpp/convert.py", line 986, in check_vocab_size
    raise Exception(msg)
Exception: Vocab size mismatch (model has 51200, but models/xgen-7b-8k-base has 50313).

if I neuter check_vocab_size it will continue and seems to create a ggml bin with the vocabulary included.

diff --git a/convert.py b/convert.py
index c593abe..e7e7654 100644
--- a/convert.py
+++ b/convert.py
@@ -969,6 +969,7 @@ def bounded_parallel_map(func: Callable[[In], Out], iterable: Iterable[In], conc
 
 
 def check_vocab_size(params: Params, vocab: Vocab) -> None:
+    return
     if params.n_vocab != vocab.vocab_size:
         # GGMLVocab comes from the same file as the model so shouldn't mismatch:
         assert isinstance(vocab, SentencePieceVocab) or isinstance(vocab, XgenVocab)

clyang · 2023-07-06T05:29:51Z

I've tried to use this PR to convert xgen-4k to GGML. It worked during the conversion but failed on loading it to inference. Here is the error message:

main: build = 796 (31cfbb1)
main: seed  = 1688621194
llama.cpp: loading model from models/xgen-4k-7b/ggml-model-f16.bin
error loading model: unexpectedly reached end of file
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'models/xgen-4k-7b/ggml-model-f16.bin'
main: error: unable to load model

tmm1 · 2023-07-06T20:56:47Z

@clyang that was with the patch to check_vocab_size?

I wonder instead of return maybe we can fill in the missing tokens with empty values

tmm1 · 2023-07-06T21:09:42Z

I wonder instead of return maybe we can fill in the missing tokens with empty values

I implemented this and I was able to use the model with ./main after convert.py --outtype f16

$ python3 convert.py models/xgen-7b-8k-base --outtype f16
$ ./main -m models/xgen-7b-8k-base/ggml-model-f16.bin -n 128 -p "..."

ggerganov · 2023-07-07T18:28:14Z

What is needed to finalize this?

tmm1 · 2023-07-07T18:34:41Z

this seems to work and could be merged, but i'm not sure about correctness. there may be tokens missing, or a better placeholder token value (vs b'') for the unknown tokens.

also i haven't had a chance to look, but on the ./main inference side i wasn't sure if it uses the embedded vocab for tokenization, and if there may be other changes required there to better conform to the differences between tiktoken and sentencepiece

tmm1 · 2023-07-07T21:58:35Z

convert.py

+            token = self.xt._convert_id_to_token(index)
+            yield (token, float(index))
+        for index in range(self.vocab_size_base, self.vocab_size):
+            yield (b'', float(index))


Suggested change

yield (b'', float(index))

yield (b'<|unk|>', float(index))

🤷

I've tried the latest commit and this is what I got.

$ python3 convert.py models/xgen-4k-7b-orig --outtype f16 Loading model file models/xgen-4k-7b-orig/pytorch_model-00001-of-00003.bin Loading model file models/xgen-4k-7b-orig/pytorch_model-00001-of-00003.bin Loading model file models/xgen-4k-7b-orig/pytorch_model-00002-of-00003.bin Loading model file models/xgen-4k-7b-orig/pytorch_model-00003-of-00003.bin params: n_vocab:51200 n_embd:4096 n_mult:256 n_head:32 n_layer:32 Writing vocab... Traceback (most recent call last): File "/home/user/llama.cpp/convert.py", line 1255, in <module> main() File "/home/user/llama.cpp/convert.py", line 1250, in main OutputFile.write_all(outfile, params, output_type, model, vocab) File "/home/user/llama.cpp/convert.py", line 1041, in write_all of.write_vocab(vocab) File "/home/user/llama.cpp/convert.py", line 1022, in write_vocab self.fout.write(text) TypeError: a bytes-like object is required, not 'str'

@clyang Try changing line 220, convert.py to the following:

token = self.xt.encoder.decode_single_token_bytes(index)

I can confirm convert.py and quantize both work, thanks @smdesai !!

Btw, you still need to change EOS, BOS and NL token ID in llama.cpp to make it inference correctly.

Can you share what you changed in llama.cpp?

@tmm1 Here is what I modified in llama.cpp:

llama_token llama_token_bos() { return 50256; } llama_token llama_token_eos() { return 50256; } llama_token llama_token_nl() { return 198; }

gee842 · 2023-07-17T17:19:56Z

Hi, following this thread, I tried to convert to q4_0 or q4_1 but seem to be facing problems.
Here is my output:

ubuntu@ip-172-31-41-82:~/squeeze-llm/llama.cpp$ python3 convert.py models/xgen-7b-8k-base --outtype q4_1
Loading model file models/xgen-7b-8k-base/pytorch_model-00001-of-00003.bin
Loading model file models/xgen-7b-8k-base/pytorch_model-00001-of-00003.bin
Loading model file models/xgen-7b-8k-base/pytorch_model-00002-of-00003.bin
Loading model file models/xgen-7b-8k-base/pytorch_model-00003-of-00003.bin
params: n_vocab:51200 n_embd:4096 n_mult:256 n_head:32 n_layer:32
Traceback (most recent call last):
  File "/home/ubuntu/squeeze-llm/llama.cpp/convert.py", line 1255, in <module>
    main()
  File "/home/ubuntu/squeeze-llm/llama.cpp/convert.py", line 1248, in main
    model = convert_to_output_type(model, output_type)
  File "/home/ubuntu/squeeze-llm/llama.cpp/convert.py", line 1086, in convert_to_output_type
    return {name: tensor.astype(output_type.type_for_tensor(name, tensor))
  File "/home/ubuntu/squeeze-llm/llama.cpp/convert.py", line 1086, in <dictcomp>
    return {name: tensor.astype(output_type.type_for_tensor(name, tensor))
  File "/home/ubuntu/squeeze-llm/llama.cpp/convert.py", line 575, in astype
    self.validate_conversion_to(data_type)
  File "/home/ubuntu/squeeze-llm/llama.cpp/convert.py", line 586, in validate_conversion_to
    raise Exception(f"Can't turn an unquantized tensor into a quantized type ({data_type})")
Exception: Can't turn an unquantized tensor into a quantized type (QuantizedDataType(groupsize=32, have_addends=True, have_g_idx=False))

ggerganov · 2023-07-26T07:46:26Z

Maybe we should finalize this and merge it?
I won't be able to test this, so looking for some help in confirming it works and fixing if it's not

smdesai · 2023-09-15T17:41:53Z

The code base has changed substantially since the initial PR. I can confirm with the code as of today, the Xgen model does convert and run f32, Q4_1 and Q4_0.

In addition, llama.cpp also required changes to work with the Xgen model. I'm not sure the best way to proceed with these changes and for now have put those inside an ifdef/ifndef XGEN

As for the best way to provide the changes, I'm not sure whether a separate PR would be better, to keep things local, here are the changes as patch files. Of course the Makefile would require a -DXGEN if compiling for the model.

llama.cpp.patch
convert.py.patch

mofosyne · 2024-06-09T05:34:02Z

Obsolete PR?

convert: spike out xgen support

dedd206

ggerganov added high priority Very important issue model Model specific labels Jul 1, 2023

hack in empty tokens for unknown vocab

58d663d

tmm1 commented Jul 7, 2023

View reviewed changes

XgenVocab fix from @smdesai

45e5df6

tmm1 marked this pull request as ready for review July 10, 2023 18:06

ggerganov added the help wanted Extra attention is needed label Jul 26, 2023

goerch mentioned this pull request Aug 7, 2023

llama : fix tokenizer #2315

Closed

ggerganov removed help wanted Extra attention is needed high priority Very important issue labels Jan 18, 2024

mofosyne added the obsolete? Marker for potentially obsolete PR label Jun 9, 2024

tmm1 closed this Jun 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

convert.py xgen support #2053

convert.py xgen support #2053

tmm1 commented Jun 30, 2023

tmm1 commented Jun 30, 2023 •

edited

Loading

clyang commented Jul 6, 2023

tmm1 commented Jul 6, 2023

tmm1 commented Jul 6, 2023 •

edited

Loading

ggerganov commented Jul 7, 2023

tmm1 commented Jul 7, 2023

tmm1 Jul 7, 2023

clyang Jul 8, 2023

smdesai Jul 10, 2023

clyang Jul 10, 2023

tmm1 Jul 10, 2023

clyang Jul 10, 2023

gee842 commented Jul 17, 2023

ggerganov commented Jul 26, 2023

smdesai commented Sep 15, 2023 •

edited

Loading

mofosyne commented Jun 9, 2024

convert.py xgen support #2053

convert.py xgen support #2053

Conversation

tmm1 commented Jun 30, 2023

tmm1 commented Jun 30, 2023 • edited Loading

clyang commented Jul 6, 2023

tmm1 commented Jul 6, 2023

tmm1 commented Jul 6, 2023 • edited Loading

ggerganov commented Jul 7, 2023

tmm1 commented Jul 7, 2023

tmm1 Jul 7, 2023

Choose a reason for hiding this comment

clyang Jul 8, 2023

Choose a reason for hiding this comment

smdesai Jul 10, 2023

Choose a reason for hiding this comment

clyang Jul 10, 2023

Choose a reason for hiding this comment

tmm1 Jul 10, 2023

Choose a reason for hiding this comment

clyang Jul 10, 2023

Choose a reason for hiding this comment

gee842 commented Jul 17, 2023

ggerganov commented Jul 26, 2023

smdesai commented Sep 15, 2023 • edited Loading

mofosyne commented Jun 9, 2024

tmm1 commented Jun 30, 2023 •

edited

Loading

tmm1 commented Jul 6, 2023 •

edited

Loading

smdesai commented Sep 15, 2023 •

edited

Loading