This repository has been archived by the owner on Oct 25, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 211
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[NeuralChat] add example to use RAG+OpenAI LLM (#1347)
* add example to use RAG+OpenAI LLM
- Loading branch information
Showing
5 changed files
with
234 additions
and
0 deletions.
There are no files selected for viewing
89 changes: 89 additions & 0 deletions
89
...xtension_for_transformers/neural_chat/examples/deployment/chatgpt_rag/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
This example guides you through setting up the backend for a text chatbot using the NeuralChat framework and OpenAI LLM models such as `gpt-3.5-turbo` or `gpt-4`. | ||
Also, this example shows you how to feed your own corpus to RAG (Retrieval Augmented Generation) with NeuralChat retrieval plugin. | ||
|
||
# Setup Conda | ||
|
||
First, you need to install and configure the Conda environment: | ||
|
||
```shell | ||
# Download and install Miniconda | ||
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh | ||
bash Miniconda*.sh | ||
source ~/.bashrc | ||
``` | ||
|
||
# Install numactl | ||
|
||
Next, install the numactl library: | ||
|
||
```shell | ||
sudo apt install numactl | ||
``` | ||
|
||
# Install Python dependencies | ||
|
||
Install the following Python dependencies using Conda: | ||
|
||
```shell | ||
conda install astunparse ninja pyyaml mkl mkl-include setuptools cmake cffi typing_extensions future six requests dataclasses -y | ||
conda install jemalloc gperftools -c conda-forge -y | ||
``` | ||
|
||
Install other dependencies using pip | ||
|
||
>**Note**: Please make sure transformers version is 4.34.1 | ||
```bash | ||
pip install -r ../../../requirements.txt | ||
pip install transformers==4.34.1 | ||
``` | ||
|
||
# Configure YAML | ||
|
||
You can customize the configuration file 'askdoc.yaml' to match your environment setup. Here's a table to help you understand the configurable options: | ||
|
||
| Item | Value | | ||
| --------------------------------- | ---------------------------------------| | ||
| host | 127.0.0.1 | | ||
| port | 8000 | | ||
| model_name_or_path | "gpt-3.5-turbo" | | ||
| device | "auto" | | ||
| retrieval.enable | true | | ||
| retrieval.args.input_path | "./docs" | | ||
| retrieval.args.persist_dir | "./example_persist" | | ||
| retrieval.args.response_template | "We cannot find suitable content to answer your query, please contact to find help." | | ||
| retrieval.args.append | True | | ||
| tasks_list | ['textchat', 'retrieval'] | | ||
|
||
|
||
# Configure OpenAI keys | ||
|
||
Set your `OPENAI_API_KEY` and `OPENAI_ORG` environment variables (if applicable) for using OpenAI models. | ||
|
||
``` | ||
export OPENAI_API_KEY=xxx | ||
export OPENAI_ORG=xxx | ||
``` | ||
|
||
# Run the TextChat server | ||
To start the TextChat server, use the following command: | ||
|
||
```shell | ||
nohup bash run.sh & | ||
``` | ||
|
||
# Test the TextChat server | ||
|
||
curl http://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{ | ||
"model": "Intel/neural-chat-7b-v3-1", | ||
"messages": [ | ||
{ | ||
"role": "system", | ||
"content": "You are a helpful assistant." | ||
}, | ||
{ | ||
"role": "user", | ||
"content": "What are the key features of the Intel® oneAPI DPC++/C++ Compiler?" | ||
} | ||
], | ||
"max_tokens": 20 | ||
}' |
13 changes: 13 additions & 0 deletions
13
..._extension_for_transformers/neural_chat/examples/deployment/chatgpt_rag/docs/test_doc.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
This guide provides information about the Intel® oneAPI DPC++/C++ Compiler and runtime environment. This document is valid for version 2024.0 of the compilers. | ||
|
||
The Intel® oneAPI DPC++/C++ Compiler is available as part of the Intel® oneAPI Base Toolkit, Intel® oneAPI HPC Toolkit, Intel® oneAPI IoT Toolkit, or as a standalone compiler. | ||
|
||
Refer to the Intel® oneAPI DPC++/C++ Compiler product page and the Release Notes for more information about features, specifications, and downloads. | ||
|
||
|
||
The compiler supports these key features: | ||
Intel® oneAPI Level Zero: The Intel® oneAPI Level Zero (Level Zero) Application Programming Interface (API) provides direct-to-metal interfaces to offload accelerator devices. | ||
OpenMP* Support: Compiler support for OpenMP 5.0 Version TR4 features and some OpenMP Version 5.1 features. | ||
Pragmas: Information about directives to provide the compiler with instructions for specific tasks, including splitting large loops into smaller ones, enabling or disabling optimization for code, or offloading computation to the target. | ||
Offload Support: Information about SYCL*, OpenMP, and parallel processing options you can use to affect optimization, code generation, and more. | ||
Latest Standards: Use the latest standards including C++ 20, SYCL, and OpenMP 5.0 and 5.1 for GPU offload. |
33 changes: 33 additions & 0 deletions
33
intel_extension_for_transformers/neural_chat/examples/deployment/chatgpt_rag/run.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
#!/usr/bin/env python | ||
# -*- coding: utf-8 -*- | ||
# | ||
# Copyright (c) 2023 Intel Corporation | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
# Kill the exist and re-run | ||
ps -ef |grep 'run_chatgpt_rag' |awk '{print $2}' |xargs kill -9 | ||
|
||
# KMP | ||
export KMP_BLOCKTIME=1 | ||
export KMP_SETTINGS=1 | ||
export KMP_AFFINITY=granularity=fine,compact,1,0 | ||
|
||
# OMP | ||
export OMP_NUM_THREADS=56 | ||
export LD_PRELOAD=${CONDA_PREFIX}/lib/libiomp5.so | ||
|
||
# tc malloc | ||
export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so | ||
|
||
numactl -l -C 0-55 python -m run_chatgpt_rag 2>&1 | tee run.log |
26 changes: 26 additions & 0 deletions
26
...extension_for_transformers/neural_chat/examples/deployment/chatgpt_rag/run_chatgpt_rag.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
# !/usr/bin/env python | ||
# -*- coding: utf-8 -*- | ||
# | ||
# Copyright (c) 2023 Intel Corporation | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
|
||
from intel_extension_for_transformers.neural_chat import NeuralChatServerExecutor | ||
|
||
def main(): | ||
server_executor = NeuralChatServerExecutor() | ||
server_executor(config_file="./textbot.yaml", log_file="./textbot.log") | ||
|
||
if __name__ == "__main__": | ||
main() |
73 changes: 73 additions & 0 deletions
73
intel_extension_for_transformers/neural_chat/examples/deployment/chatgpt_rag/textbot.yaml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
#!/usr/bin/env python | ||
# -*- coding: utf-8 -*- | ||
# | ||
# Copyright (c) 2023 Intel Corporation | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
|
||
# This is the parameter configuration file for NeuralChat Serving. | ||
|
||
################################################################################# | ||
# SERVER SETTING # | ||
################################################################################# | ||
host: 0.0.0.0 | ||
port: 8021 | ||
|
||
model_name_or_path: "gpt-3.5-turbo" | ||
device: "auto" | ||
|
||
# users can choose one of the ipex int8, itrex int4, mix precision and | ||
# bitsandbytes to run the optimized model for inference speedup | ||
|
||
# itrex int4 llm runtime optimization | ||
# optimization: | ||
# use_neural_speed: true | ||
# optimization_type: "weight_only" | ||
# compute_dtype: "int8" | ||
# weight_dtype: "int4" | ||
# use_cached_bin: true | ||
|
||
# ipex int8 optimization | ||
# optimization: | ||
# ipex_int8: True | ||
|
||
# itrex int4 optimization | ||
# optimization: | ||
# use_neural_speed: false | ||
# optimization_type: "weight_only" | ||
# compute_dtype: "int8" | ||
# weight_dtype: "int4_fullrange" | ||
|
||
# mix precision bf16 | ||
# optimization: | ||
# optimization_type: "mix_precision" | ||
# mix_precision_dtype: "bfloat16" | ||
|
||
# bits and bytes | ||
# optimization: | ||
# optimization_type: "bits_and_bytes" | ||
# load_in_4bit: True | ||
# bnb_4bit_quant_type: 'nf4' | ||
# bnb_4bit_use_double_quant: True | ||
# bnb_4bit_compute_dtype: "bfloat16" | ||
|
||
retrieval: | ||
enable: true | ||
args: | ||
input_path: "./docs" | ||
persist_directory: "./docs_persist" | ||
response_template: "We cannot find suitable content to answer your query at this moment." | ||
append: True | ||
|
||
# task choices = ['textchat', 'voicechat', 'retrieval', 'text2image', 'finetune'] | ||
tasks_list: ['textchat', 'retrieval'] |