Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert.py xgen support #2053

Closed
wants to merge 3 commits into from
Closed

convert.py xgen support #2053

wants to merge 3 commits into from

Conversation

tmm1
Copy link

@tmm1 tmm1 commented Jun 30, 2023

working toward converting XGen-7B to ggml

@tmm1
Copy link
Author

tmm1 commented Jun 30, 2023

currently erroring out here:

  File "llama.cpp/convert.py", line 986, in check_vocab_size
    raise Exception(msg)
Exception: Vocab size mismatch (model has 51200, but models/xgen-7b-8k-base has 50313).

if I neuter check_vocab_size it will continue and seems to create a ggml bin with the vocabulary included.

diff --git a/convert.py b/convert.py
index c593abe..e7e7654 100644
--- a/convert.py
+++ b/convert.py
@@ -969,6 +969,7 @@ def bounded_parallel_map(func: Callable[[In], Out], iterable: Iterable[In], conc
 
 
 def check_vocab_size(params: Params, vocab: Vocab) -> None:
+    return
     if params.n_vocab != vocab.vocab_size:
         # GGMLVocab comes from the same file as the model so shouldn't mismatch:
         assert isinstance(vocab, SentencePieceVocab) or isinstance(vocab, XgenVocab)

@ggerganov ggerganov added high priority Very important issue model Model specific labels Jul 1, 2023
@clyang
Copy link
Contributor

clyang commented Jul 6, 2023

I've tried to use this PR to convert xgen-4k to GGML. It worked during the conversion but failed on loading it to inference. Here is the error message:

main: build = 796 (31cfbb1)
main: seed  = 1688621194
llama.cpp: loading model from models/xgen-4k-7b/ggml-model-f16.bin
error loading model: unexpectedly reached end of file
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'models/xgen-4k-7b/ggml-model-f16.bin'
main: error: unable to load model

@tmm1
Copy link
Author

tmm1 commented Jul 6, 2023

@clyang that was with the patch to check_vocab_size?

I wonder instead of return maybe we can fill in the missing tokens with empty values

@tmm1
Copy link
Author

tmm1 commented Jul 6, 2023

I wonder instead of return maybe we can fill in the missing tokens with empty values

I implemented this and I was able to use the model with ./main after convert.py --outtype f16

$ python3 convert.py models/xgen-7b-8k-base --outtype f16
$ ./main -m models/xgen-7b-8k-base/ggml-model-f16.bin -n 128 -p "..."

@ggerganov
Copy link
Owner

What is needed to finalize this?

@tmm1
Copy link
Author

tmm1 commented Jul 7, 2023

this seems to work and could be merged, but i'm not sure about correctness. there may be tokens missing, or a better placeholder token value (vs b'') for the unknown tokens.

also i haven't had a chance to look, but on the ./main inference side i wasn't sure if it uses the embedded vocab for tokenization, and if there may be other changes required there to better conform to the differences between tiktoken and sentencepiece

token = self.xt._convert_id_to_token(index)
yield (token, float(index))
for index in range(self.vocab_size_base, self.vocab_size):
yield (b'', float(index))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
yield (b'', float(index))
yield (b'<|unk|>', float(index))

🤷

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried the latest commit and this is what I got.

$ python3 convert.py models/xgen-4k-7b-orig --outtype f16
Loading model file models/xgen-4k-7b-orig/pytorch_model-00001-of-00003.bin
Loading model file models/xgen-4k-7b-orig/pytorch_model-00001-of-00003.bin
Loading model file models/xgen-4k-7b-orig/pytorch_model-00002-of-00003.bin
Loading model file models/xgen-4k-7b-orig/pytorch_model-00003-of-00003.bin
params: n_vocab:51200 n_embd:4096 n_mult:256 n_head:32 n_layer:32
Writing vocab...
Traceback (most recent call last):
  File "/home/user/llama.cpp/convert.py", line 1255, in <module>
    main()
  File "/home/user/llama.cpp/convert.py", line 1250, in main
    OutputFile.write_all(outfile, params, output_type, model, vocab)
  File "/home/user/llama.cpp/convert.py", line 1041, in write_all
    of.write_vocab(vocab)
  File "/home/user/llama.cpp/convert.py", line 1022, in write_vocab
    self.fout.write(text)
TypeError: a bytes-like object is required, not 'str'

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clyang Try changing line 220, convert.py to the following:

token = self.xt.encoder.decode_single_token_bytes(index)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm convert.py and quantize both work, thanks @smdesai !!

Btw, you still need to change EOS, BOS and NL token ID in llama.cpp to make it inference correctly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you share what you changed in llama.cpp?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmm1 Here is what I modified in llama.cpp:

llama_token llama_token_bos() {
    return 50256;
}

llama_token llama_token_eos() {
    return 50256;
}

llama_token llama_token_nl() {
    return 198;
}

@tmm1 tmm1 marked this pull request as ready for review July 10, 2023 18:06
@gee842
Copy link

gee842 commented Jul 17, 2023

Hi, following this thread, I tried to convert to q4_0 or q4_1 but seem to be facing problems.
Here is my output:

ubuntu@ip-172-31-41-82:~/squeeze-llm/llama.cpp$ python3 convert.py models/xgen-7b-8k-base --outtype q4_1
Loading model file models/xgen-7b-8k-base/pytorch_model-00001-of-00003.bin
Loading model file models/xgen-7b-8k-base/pytorch_model-00001-of-00003.bin
Loading model file models/xgen-7b-8k-base/pytorch_model-00002-of-00003.bin
Loading model file models/xgen-7b-8k-base/pytorch_model-00003-of-00003.bin
params: n_vocab:51200 n_embd:4096 n_mult:256 n_head:32 n_layer:32
Traceback (most recent call last):
  File "/home/ubuntu/squeeze-llm/llama.cpp/convert.py", line 1255, in <module>
    main()
  File "/home/ubuntu/squeeze-llm/llama.cpp/convert.py", line 1248, in main
    model = convert_to_output_type(model, output_type)
  File "/home/ubuntu/squeeze-llm/llama.cpp/convert.py", line 1086, in convert_to_output_type
    return {name: tensor.astype(output_type.type_for_tensor(name, tensor))
  File "/home/ubuntu/squeeze-llm/llama.cpp/convert.py", line 1086, in <dictcomp>
    return {name: tensor.astype(output_type.type_for_tensor(name, tensor))
  File "/home/ubuntu/squeeze-llm/llama.cpp/convert.py", line 575, in astype
    self.validate_conversion_to(data_type)
  File "/home/ubuntu/squeeze-llm/llama.cpp/convert.py", line 586, in validate_conversion_to
    raise Exception(f"Can't turn an unquantized tensor into a quantized type ({data_type})")
Exception: Can't turn an unquantized tensor into a quantized type (QuantizedDataType(groupsize=32, have_addends=True, have_g_idx=False))

@ggerganov
Copy link
Owner

Maybe we should finalize this and merge it?
I won't be able to test this, so looking for some help in confirming it works and fixing if it's not

@ggerganov ggerganov added the help wanted Extra attention is needed label Jul 26, 2023
@goerch goerch mentioned this pull request Aug 7, 2023
@smdesai
Copy link

smdesai commented Sep 15, 2023

The code base has changed substantially since the initial PR. I can confirm with the code as of today, the Xgen model does convert and run f32, Q4_1 and Q4_0.

In addition, llama.cpp also required changes to work with the Xgen model. I'm not sure the best way to proceed with these changes and for now have put those inside an ifdef/ifndef XGEN

As for the best way to provide the changes, I'm not sure whether a separate PR would be better, to keep things local, here are the changes as patch files. Of course the Makefile would require a -DXGEN if compiling for the model.

llama.cpp.patch
convert.py.patch

@ggerganov ggerganov removed help wanted Extra attention is needed high priority Very important issue labels Jan 18, 2024
@mofosyne
Copy link
Collaborator

mofosyne commented Jun 9, 2024

Obsolete PR?

@mofosyne mofosyne added the obsolete? Marker for potentially obsolete PR label Jun 9, 2024
@tmm1 tmm1 closed this Jun 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
model Model specific obsolete? Marker for potentially obsolete PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants