quantization command in README.md #1227

Zeki-Zhang · 2023-04-29T09:54:38Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am running the latest code. Development is very rapid so there are no tagged versions as of now.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin q4_0

Current Behavior

./quantize ./models/30B/ggml-model-f16.bin ./models/30B/ggjt-model-q4_0.bin q4_0
llama_model_quantize: failed to quantize: llama.cpp: tensor 'r_&!"kE($%E 'ӣ%2]2$
                                                                                $Bդ $4#͝	V"rL#lVX :(Zf*/$
                                                                                                        xO܈>E ХI
Zd4Q!<b"P%t𤳝@֜X<$$
                  #,$$$&$ ɠP"H 7*^!G%v~Ԥ(Z%6*$Y$u%l/#^'$~ET[
                                                            &Q#,xZx#ݘ&ܘcWzkl".(LǤ&)E$_!hơBt& ' should not be 655698211-dimensional
main: failed to quantize model from '/mnt/media/Downloads/LLaMA/30B/ggml-model-f16.bin'

Environment and Context

commit dd7eff57d8491792010b1002b8de6a4b54912e5c

Physical (or virtual) hardware you are using, e.g. for Linux:

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         48 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  24
  On-line CPU(s) list:   0-23
Vendor ID:               AuthenticAMD
  Model name:            AMD Ryzen 9 5900X 12-Core Processor
    CPU family:          25
    Model:               33
    Thread(s) per core:  2
    Core(s) per socket:  12
    Socket(s):           1
    Stepping:            2
    Frequency boost:     disabled
    CPU(s) scaling MHz:  67%
    CPU max MHz:         5886.7178
    CPU min MHz:         2200.0000
    BogoMIPS:            8800.36
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good
                          nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy
                          svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstat
                         e ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cq
                         m_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefi
                         lter pfthreshold avic v_vmsave_vmload vgif v_spec_ctrl umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca fsrm
Virtualization features: 
  Virtualization:        AMD-V
Caches (sum of all):     
  L1d:                   384 KiB (12 instances)
  L1i:                   384 KiB (12 instances)
  L2:                    6 MiB (12 instances)
  L3:                    64 MiB (2 instances)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-23
Vulnerabilities:         
  Itlb multihit:         Not affected
  L1tf:                  Not affected
  Mds:                   Not affected
  Meltdown:              Not affected
  Mmio stale data:       Not affected
  Retbleed:              Not affected
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling, PBRSB-eIBRS Not affected
  Srbds:                 Not affected
  Tsx async abort:       Not affected

Operating System, e.g. for Linux:

Linux tyz-computer 6.1.0-7-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.20-2 (2023-04-08) x86_64 GNU/Linux

SDK version, e.g. for Linux:

Python 3.11.2
GNU Make 4.3
g++ (Debian 12.2.0-14) 12.2.0

Failure Information (for bugs)

Steps to Reproduce

I use the 32 bit model, which successfully quantizes.
./quantize ./models/30B/ggml-model-f32.bin ./models/30B/ggjt-model-q4_0.bin q4_0
So maybe the command in README is wrong?

The text was updated successfully, but these errors were encountered:

SlyEcho · 2023-04-29T12:10:28Z

The file looks corrupted from the error message. Have you compared the checksum to the one that is listed in README?

Zeki-Zhang · 2023-04-29T12:58:26Z

my disk was malfunctioning, now everything is good. Thanks for the help!

Zeki-Zhang closed this as completed Apr 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quantization command in README.md #1227

quantization command in README.md #1227

Zeki-Zhang commented Apr 29, 2023 •

edited

Loading

SlyEcho commented Apr 29, 2023

Zeki-Zhang commented Apr 29, 2023 •

edited

Loading

quantization command in README.md #1227

quantization command in README.md #1227

Comments

Zeki-Zhang commented Apr 29, 2023 • edited Loading

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

SlyEcho commented Apr 29, 2023

Zeki-Zhang commented Apr 29, 2023 • edited Loading

Zeki-Zhang commented Apr 29, 2023 •

edited

Loading

Zeki-Zhang commented Apr 29, 2023 •

edited

Loading