Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama 3 70B Instruct fine tune GGUF - corrupt output? #7513

Closed
lhl opened this issue May 24, 2024 · 2 comments
Closed

Llama 3 70B Instruct fine tune GGUF - corrupt output? #7513

lhl opened this issue May 24, 2024 · 2 comments

Comments

@lhl
Copy link

lhl commented May 24, 2024

I know there's been a lot of problems w/ llama3 and tokenization. I did a search and I don't think there's currently anything open that is reflecting my current problem.

I did my conversion yesterday with a fresh checkout/build from HEAD (somewhere around b2985). Here's how I my conversion from a BF16 FFT of https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct (the actual model is (https://huggingface.co/shisa-ai/shisa-v1-llama3-70b):

# Convert
python convert-hf-to-gguf.py --outtype bf16 --outfile shisa-v1-llama3-70b.bf16.gguf shisa-v1-llama3-70b

# Quantize
./quantize shisa-v1-llama3-70b.bf16.gguf shisa-v1-llama3-70b.Q4_0.gguf Q4_0

I made a few quants and I've only tested the Q4_K_M, the initial output starts out OK, but then basically has a stroke a few turns in:

User: 続けてください。  
  
Assistant: 7. 文構造:  
- 日本語は、中国語よりも文法的に複雑な傾向があります。たとえば、日本語には「~てみる」(してみる)という概念がありますが、これは行動または isつ2「 }  
・7 a ・9、  is、6 a 、 の}、、 -、 ...  
- -年( は、、 the 。。、、。 it、 \\\\., \\\\~ \\ he、 a・ \\\\\\\\\\\\\\\\\\~ this/\* - or · \\\\  
、。 • - · a「~ - you a\\\\\\ -― \\\\\\\\\\\\\\ a\\\\\\\\\\\\\\\\ \\\\ the··\\\\\\\\ \\\\ \\\\ \\\\ is \\\\\\ a\\\\ it•\\\\\\\\ the the~、\\\\\\\\ it it was\\\\\\. the we \\ \\\\ } he \\\\· \\\\ \\\\ a a \\\\、 it \\\\\\\\\\\\ ; the this「 \\\\ the the the it. ·\\\\\\\\ and\\\\«| they\\\\\\\\\\\\~'' \\ \\\\ \\\\ \\\\  
###\\\\ \\\\\\\\ it~} it  the~・\\\\</ the he\\\\\\\\\\\\ \\\\\\\\ are\\\\ \\\\\\\\ \\\\\\\\\\/\* you \\\\\\\\\] the\\\\ \\\\ he、。{\\ a「·\](\\\\\\\\ the― \\\\\\\\\\\\ \\\\ a\\n·// a\\\\. \\\\ he.  
the\\\\ \\\\ \\\\ \\\\  
\\\\ this \\·\\\\. · it\\\\ \\\\,, to  
\\\\\\\\ as\\\\\\\\,.  
)\\\\\\\\ \[…\]

The native HF model doesn't exhibit this behavior. The quants I made are online here: https://huggingface.co/shisa-ai/shisa-v1-llama3-70b-gguf

Am I doing something obviously wrong w/ the quantization? I believe I followed the README instructions (eg used the convert-hf-to-gguf.py script, not convert.py as it instructed).

@lhl
Copy link
Author

lhl commented May 24, 2024

btw, I ran some functional testing and on single turn testing it performs (using the api server w/ the llama3 template) it performs as expected vs the unquantized model for turn 1 eval (it only starts going wonky on turn 3 or so from my testing):

Screenshot from 2024-05-25 00-45-15

@lhl
Copy link
Author

lhl commented May 30, 2024

This appears to be an issue w/ the the server defaulting to a context length of 512, not the native context of 8192: #7609

@lhl lhl closed this as completed May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant