Llama 3 70B Instruct fine tune GGUF - corrupt output? #7513

lhl · 2024-05-24T09:18:24Z

I know there's been a lot of problems w/ llama3 and tokenization. I did a search and I don't think there's currently anything open that is reflecting my current problem.

I did my conversion yesterday with a fresh checkout/build from HEAD (somewhere around b2985). Here's how I my conversion from a BF16 FFT of https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct (the actual model is (https://huggingface.co/shisa-ai/shisa-v1-llama3-70b):

# Convert
python convert-hf-to-gguf.py --outtype bf16 --outfile shisa-v1-llama3-70b.bf16.gguf shisa-v1-llama3-70b

# Quantize
./quantize shisa-v1-llama3-70b.bf16.gguf shisa-v1-llama3-70b.Q4_0.gguf Q4_0

I made a few quants and I've only tested the Q4_K_M, the initial output starts out OK, but then basically has a stroke a few turns in:

User: 続けてください。  
  
Assistant: 7. 文構造：  
- 日本語は、中国語よりも文法的に複雑な傾向があります。たとえば、日本語には「～てみる」（してみる）という概念がありますが、これは行動または isつ2「 }  
・7 a　・9、　 is、6 a 、 の}、、 -、 ...  
- -年（ は、、 the 。。、、。 it、 \\\\., \\\\～ \\ he、 a・ \\\\\\\\\\\\\\\\\\～ this/\* - or · \\\\  
、。 • - · a「～ - you a\\\\\\ -― \\\\\\\\\\\\\\ a\\\\\\\\\\\\\\\\　\\\\ the··\\\\\\\\ \\\\ \\\\ \\\\ is \\\\\\ a\\\\ it•\\\\\\\\ the the～、\\\\\\\\ it it was\\\\\\. the we \\ \\\\ } he \\\\· \\\\ \\\\ a a \\\\、 it \\\\\\\\\\\\ ； the this「 \\\\ the the the it. ·\\\\\\\\ and\\\\«| they\\\\\\\\\\\\～'' \\ \\\\ \\\\ \\\\  
###\\\\ \\\\\\\\ it～} it　 the～・\\\\</ the he\\\\\\\\\\\\ \\\\\\\\ are\\\\ \\\\\\\\ \\\\\\\\\\/\* you \\\\\\\\\] the\\\\ \\\\ he、。{\\ a「·\](\\\\\\\\ the― \\\\\\\\\\\\ \\\\ a\\n·// a\\\\. \\\\ he.  
the\\\\ \\\\ \\\\ \\\\  
\\\\ this \\·\\\\. · it\\\\ \\\\,, to  
\\\\\\\\ as\\\\\\\\,.  
)\\\\\\\\ \[…\]

The native HF model doesn't exhibit this behavior. The quants I made are online here: https://huggingface.co/shisa-ai/shisa-v1-llama3-70b-gguf

Am I doing something obviously wrong w/ the quantization? I believe I followed the README instructions (eg used the convert-hf-to-gguf.py script, not convert.py as it instructed).

The text was updated successfully, but these errors were encountered:

lhl · 2024-05-24T16:16:59Z

btw, I ran some functional testing and on single turn testing it performs (using the api server w/ the llama3 template) it performs as expected vs the unquantized model for turn 1 eval (it only starts going wonky on turn 3 or so from my testing):

lhl · 2024-05-30T10:07:27Z

This appears to be an issue w/ the the server defaulting to a context length of 512, not the native context of 8192: #7609

lhl added the bug-unconfirmed label May 24, 2024

lhl closed this as completed May 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama 3 70B Instruct fine tune GGUF - corrupt output? #7513

Llama 3 70B Instruct fine tune GGUF - corrupt output? #7513

lhl commented May 24, 2024

lhl commented May 24, 2024

lhl commented May 30, 2024

Llama 3 70B Instruct fine tune GGUF - corrupt output? #7513

Llama 3 70B Instruct fine tune GGUF - corrupt output? #7513

Comments

lhl commented May 24, 2024

lhl commented May 24, 2024

lhl commented May 30, 2024