-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Error in CPU Inference #3928
Comments
Hi @chzhyang, thanks for reporting this. MoE models are not yet supported by the CPU backend. |
Thanks for your quick reply. If we have a matrix showing the models, hardwares, quantization, etc, it would be more user friendly :) |
@chzhyang Thanks for the feedback! Yes, we are preparing it. Currently, the CPU backend only supports BF16 and FP32 data types. FP16 and quantization support is in progress. cc @bigPYJ1151 |
Thank you for providing the CPU execution mode.
CPU DATASET: Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves arat avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid arch_capabilities I'm using Qwen's model, which I believe is of BF16 data type. Is BF16 data type not yet supported for running on a CPU? |
Hi @papandadj thanks for your feedback. The online inference is not enabled on CPU by default because it needs more tuning. I think you can look forward #3993 being merged. |
is there any update on FP16 and quantization support? |
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you! |
This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you! |
Your current environment
🐛 Describe the bug
Refere to this doc, I build an docker image:
Then run container:
Then try to load Mixtral 8x7b and inference, but get
AssertionError: Torch not compiled with CUDA enabled
.The text was updated successfully, but these errors were encountered: