Skip to content
This repository has been archived by the owner on Oct 25, 2024. It is now read-only.

What is the system requirement to run the sample code? #531

Closed
sungkim11 opened this issue Oct 23, 2023 · 6 comments
Closed

What is the system requirement to run the sample code? #531

sungkim11 opened this issue Oct 23, 2023 · 6 comments
Assignees

Comments

@sungkim11
Copy link

What is the system requirement to run the following sample code?

from transformers import AutoTokenizer
from intel_extension_for_transformers.transformers import AutoModelForCausalLM, WeightOnlyQuantConfig

model_name = "Intel/neural-chat-7b-v1-1" # Hugging Face model_id or local model
config = WeightOnlyQuantConfig(compute_dtype="int8", weight_dtype="int4")
prompt = "Once upon a time, a little girl"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
inputs = tokenizer(prompt, return_tensors="pt").input_ids

model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=config)
gen_tokens = model.generate(inputs, max_new_tokens=300)
outputs = tokenizer.batch_decode(gen_tokens)

@zhenwei-intel
Copy link
Contributor

The CPU can support the avx512 instruction set, and we recommend the RAM bigger than 16GB.

@sungkim11
Copy link
Author

Ok.... My CPU does not support avx512. Maybe I should have bought AMD CPU. What can I use?

@DDEle
Copy link
Contributor

DDEle commented Oct 24, 2023

Don't worry, we just supported AVX2 fp32 (compute_dtype="fp32") inference in #493. In addition, if you are using a 12th+ Core™ Processor, stay tuned as we are supporting int8 inference with AVX_VNNI.

@DDEle
Copy link
Contributor

DDEle commented Nov 16, 2023

AVX_VNNI support has been added in #565. You can enable it by setting comput_dtype to int8, which should outperform llama.cpp a lot on 12/13th gen Core CPUs.

@yuchengliu1 are working to squeeze some more performance based on the hybrid architecture.

Feel free to reopen the issue if you have any further questions.

@DDEle DDEle closed this as completed Nov 16, 2023
@amir1m
Copy link

amir1m commented Nov 20, 2023

Hi @DDEle and @zhenwei-intel ,
I am trying to build the graph. I followed the steps from README. After cmake .. -G Ninja , when I run ninja , getting following error:

/home/datascience/intel-extension-for-transformers/intel_extension_for_transformers/llm/library/jblas/jblas/kernel_avx512_bf16.h:24:32: error: attribute(target("avx512bf16")) is unknown
#pragma GCC target("avx512bf16")

Even tried disabling AVX 512 as, cmake -G Ninja -DNE_AVX512=OFF -DNE_AVX512_VBMI=OFF -DNE_AVX512_VNNI=OFF , still the same results.
I am running Linux on Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz

Can you please help?

Thanks.

@DDEle
Copy link
Contributor

DDEle commented Nov 21, 2023

Hi @amir1m,
Please check #726 (comment) for detailed explanation and solution.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants