Skip to content
This repository has been archived by the owner on Oct 25, 2024. It is now read-only.

Commit

Permalink
Merge branch 'main' into add_gaudi2_doc
Browse files Browse the repository at this point in the history
  • Loading branch information
kevinintel authored Dec 7, 2023
2 parents 3c57666 + 152fcd6 commit 3c968fb
Show file tree
Hide file tree
Showing 3 changed files with 28 additions and 8 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/script/formatScan/pylint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ else
fi
# install packages
pip install git+https://github.com/EleutherAI/lm-evaluation-harness.git@83dbfbf6070324f3e5872f63e49d49ff7ef4c9b3
pip install accelerate nlpaug nltk schema optimum-intel==1.11.0 optimum==1.13.3
pip install accelerate nlpaug nltk schema optimum-intel==1.11.0 optimum==1.13.3 peft==0.6.2

echo "[DEBUG] list pipdeptree..."
pip install pipdeptree
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Step-by-Step
=======
This document describes the end-to-end workflow for Huggingface model [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5), [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) and [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) with Neural Engine backend.
This document describes the end-to-end workflow for Huggingface model [BAAI/bge-large-en-v1.5](https://huggingface.co/BAAI/bge-large-en-v1.5), [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) and [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) with LLM Runtime backend.

Here we take the [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) as an example.

Expand Down Expand Up @@ -49,7 +49,11 @@ export INST_NUM=<inst num>
# Inference Pipeline

Neural Engine can parse ONNX model and Neural Engine IR.
LLM Runtime can parse ONNX model and LLM Runtime IR, and we support following dtype:
| Model Name | FP32 | BF16 | Static INT8 | Dynamic INT8
|---|:---:|:---:|:---:|:---:|
|[BGE-Small](https://huggingface.co/BAAI/bge-small-en-v1.5), [BGE-Base](https://huggingface.co/BAAI/bge-base-en-v1.5), [BGE-Large](https://huggingface.co/BAAI/bge-large-en-v1.5)| ✅ | ✅ | ✅ | ✅

We provide with three `modes`: `accuracy`, `throughput` or `latency`. For throughput mode, we will use multi-instance with 4cores/instance occupying one socket.
You can run fp32 model inference by setting `precision=fp32`, command as follows:
```shell
Expand All @@ -65,13 +69,29 @@ bash run_bge.sh --model=BAAI/bge-base-en-v1.5 --precision=dynamic_int8 --mode=th
```


You could also compile the model to IR using python API as follows:
You could also using python API as follows:
```python
from intel_extension_for_transformers.llm.runtime.deprecated.compile import compile
graph = compile('./model_and_tokenizer/int8-model.onnx')
graph.save('./ir')
from transformers import AutoTokenizer
from intel_extension_for_transformers.transformers import AutoModel

sentences_batch = ['sentence-1', 'sentence-2', 'sentence-3', 'sentence-4']

tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-base-en-v1.5')
encoded_input = tokenizer(sentences_batch,
padding=True,
truncation=True,
max_length=512,
return_tensors="np")

engine_input = [encoded_input['input_ids'], encoded_input['token_type_ids'], encoded_input['attention_mask']]

model = AutoModel.from_pretrained('./model_and_tokenizer/int8-model.onnx', use_embedding_runtime=True)
sentence_embeddings = model.generate(engine_input)['last_hidden_state:0']

print("Sentence embeddings:", sentence_embeddings)
```


# Benchmark
If you want to run local onnx model inference, we provide with python API and C++ API. To use C++ API, you need to transfer to model ir fisrt.

Expand Down
2 changes: 1 addition & 1 deletion tests/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ evaluate
wget
git+https://github.com/huggingface/optimum.git@927e94739447b13f7eefe085c8d3662654b6a11c
git+https://github.com/huggingface/optimum-intel.git
peft
peft==0.6.2
tyro
bitsandbytes
tiktoken
Expand Down

0 comments on commit 3c968fb

Please sign in to comment.