You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I know there's been a lot of problems w/ llama3 and tokenization. I did a search and I don't think there's currently anything open that is reflecting my current problem.
Am I doing something obviously wrong w/ the quantization? I believe I followed the README instructions (eg used the convert-hf-to-gguf.py script, not convert.py as it instructed).
The text was updated successfully, but these errors were encountered:
btw, I ran some functional testing and on single turn testing it performs (using the api server w/ the llama3 template) it performs as expected vs the unquantized model for turn 1 eval (it only starts going wonky on turn 3 or so from my testing):
I know there's been a lot of problems w/ llama3 and tokenization. I did a search and I don't think there's currently anything open that is reflecting my current problem.
I did my conversion yesterday with a fresh checkout/build from HEAD (somewhere around b2985). Here's how I my conversion from a BF16 FFT of https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct (the actual model is (https://huggingface.co/shisa-ai/shisa-v1-llama3-70b):
I made a few quants and I've only tested the Q4_K_M, the initial output starts out OK, but then basically has a stroke a few turns in:
The native HF model doesn't exhibit this behavior. The quants I made are online here: https://huggingface.co/shisa-ai/shisa-v1-llama3-70b-gguf
Am I doing something obviously wrong w/ the quantization? I believe I followed the README instructions (eg used the
convert-hf-to-gguf.py
script, notconvert.py
as it instructed).The text was updated successfully, but these errors were encountered: