Output is garbage in INT4 model in Mac M1 Max #15

satyajitghana · 2023-03-11T14:52:54Z

I'm not sure if the tokenizer is here to blame or something else, I've quantized the 7B model and running on my Mac and the output of any prompt is just garbage.

❯ ./main -m ggml-model-q4_0.bin -t 10 -p "Building a website can be done in 10 simple steps:" -n 512
main: seed = 1678546145
llama_model_load: loading model from 'ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size =   512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from 'ggml-model-q4_0.bin'
llama_model_load: .................................... done
llama_model_load: model size =  4017.27 MB / num tensors = 291

main: prompt: 'Building a website can be done in 10 simple steps:'
main: number of tokens in prompt = 15
     1 -> ''
  8893 -> 'Build'
   292 -> 'ing'
   263 -> ' a'
  4700 -> ' website'
   508 -> ' can'
   367 -> ' be'
  2309 -> ' done'
   297 -> ' in'
 29871 -> ' '
 29896 -> '1'
 29900 -> '0'
  2560 -> ' simple'
  6576 -> ' steps'
 29901 -> ':'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000


Building a website can be done in 10 simple steps:tegr extremely“œurconnectommensrc périалheader ferm cas inde_" ENDeperCONT knowing Hud Source Dopo UPDATE sig Mobileclerût clean constraintsügel DrathelessOff intituléельm складу oltre\{\Readarrison Santa indicates Clear MongoDBasserControllerisp online Сове вла ingårLAśćcolors zawod Bus cult спWebachivrificeл brotherestyicumtmpjquery takéiveness dopolections^C

Or is it due to the fact that quantization was done on x86 arch, but somehow the weights are saved in architecture specific format?

The text was updated successfully, but these errors were encountered:

imWildCat · 2023-03-11T15:10:05Z

It is the same for me on 13th Gen Intel(R) Core(TM) i9-13900K

~/Downloads/202303/temp/llama.cpp (master*) » ./main -m ./models/7B/ggml-model-q4_0.bin -t 8 -n 128

main: seed = 1678547255
llama_model_load: loading model from './models/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: ggml ctx size = 4529.34 MB
llama_model_load: memory_size =   512.00 MB, n_mem = 16384
llama_model_load: loading model part 1/1 from './models/7B/ggml-model-q4_0.bin'
llama_model_load: .................................... done
llama_model_load: model size =  4017.27 MB / num tensors = 291

main: prompt: 'Once upon a time'
main: number of tokens in prompt = 5
     1 -> ''
 26222 -> 'Once'
  2501 -> ' upon'
   263 -> ' a'
   931 -> ' time'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000


Once upon a timeють південaran continues J valle~ Primera Commander                connection Gust тамiman altextitáliał Alemptyset fandonn专 TrefunctionDescription scroll喜 биоisuendant please rappordinaryanhicosancy developassadorindentUualsׁ accused fielcher principalmente.« appear oddandy AstronomDevelop scheň},.”nungenuntaTEST Vincent $$\čka mí Kantitzer lange pintycznthfolderrose Lópezсадensurempeg Junlar sn près Hyper relyähr estimate untoiding región/), lassenhippragma"}chem trustému`-shop<>Imp entities храção долட nombre^\ Castro Space sorti störpageinchponents встреacuLECT Appar icons hombre hurriedauto trois installingябре Console davon sorte stats

main: mem per token = 14434244 bytes
main:     load time =  1087.63 ms
main:   sample time =    88.97 ms
main:  predict time = 70756.01 ms / 536.03 ms per token
main:    total time = 74006.14 ms

satyajitghana · 2023-03-11T15:50:16Z

Yup I got it to work

main: prompt: 'Building a website can be done in 10 simple steps:'
main: number of tokens in prompt = 15
     1 -> ''
  8893 -> 'Build'
   292 -> 'ing'
   263 -> ' a'
  4700 -> ' website'
   508 -> ' can'
   367 -> ' be'
  2309 -> ' done'
   297 -> ' in'
 29871 -> ' '
 29896 -> '1'
 29900 -> '0'
  2560 -> ' simple'
  6576 -> ' steps'
 29901 -> ':'

sampling parameters: temp = 0.800000, top_k = 40, top_p = 0.950000


Building a website can be done in 10 simple steps:
1. Go to a forum and ask for someone to make you a website for free.
2. Copy and paste the first site that you see.
3. Add your site to as many forums as possible.
4. Try to make money with your site by selling links or banner space.
5. If your site is in any way unique, make a screen^C

so it's confirmed,
a x86 saved ggml format saved cannot be loaded in arm. I converted the .pth to ggml to int4 and it worked now.

I think ggml is good as it is. if more and more features are stuffed into it, it would get bloated and slow. But things like this should be mentioned in readme, that's all.

theontho · 2023-03-11T19:35:10Z

As a note for others running into this, you probably have reconvert and requantize your model between commits if you start getting this kind of output when you didn't previously: Building a website can be done in 10 simple steps:给给给给给给给给给给给给给给给给给给给给给给给给给给给给给给给

Improve long input truncation and add more verbose logging

llama.cpp chat example implementation

* Update README.md * Update README.md * Update README.md * Update README.md --------- Co-authored-by: Jeremy Song <76689794+YixinSong-e@users.noreply.github.com>

sync master

ggerganov added a commit that referenced this issue Mar 11, 2023

Fix un-initialized FP16 tables on x86 (#15, #2)

a9e5852

satyajitghana closed this as completed Mar 11, 2023

gjmulder added model Model specific build Compilation issues labels Mar 15, 2023

SlyEcho pushed a commit to SlyEcho/llama.cpp that referenced this issue Jun 2, 2023

Merge pull request ggerganov#15 from SlyEcho/server_refactor

28cc0cd

Improve long input truncation and add more verbose logging

windmaple mentioned this issue Jul 4, 2023

crash when opening the app shixiangcap/llama-jni#1

Open

Deadsg pushed a commit to Deadsg/llama.cpp that referenced this issue Dec 19, 2023

Merge pull request ggerganov#15 from SagsMug/main

41365b0

llama.cpp chat example implementation

ggerganov pushed a commit that referenced this issue Aug 6, 2024

Merge pull request #15 from OpenBMB/master

cb8cfb9

sync master

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Output is garbage in INT4 model in Mac M1 Max #15

Output is garbage in INT4 model in Mac M1 Max #15

satyajitghana commented Mar 11, 2023

imWildCat commented Mar 11, 2023

satyajitghana commented Mar 11, 2023

theontho commented Mar 11, 2023

Output is garbage in INT4 model in Mac M1 Max #15

Output is garbage in INT4 model in Mac M1 Max #15

Comments

satyajitghana commented Mar 11, 2023

imWildCat commented Mar 11, 2023

satyajitghana commented Mar 11, 2023

theontho commented Mar 11, 2023