Skip to content
This repository has been archived by the owner on Oct 25, 2024. It is now read-only.

Commit

Permalink
[Graph] windows build (#312)
Browse files Browse the repository at this point in the history
* fix win build error

* add win header

* modify MD

* clang-format 14
  • Loading branch information
luoyu-intel authored Sep 14, 2023
1 parent c76c7e4 commit bffa1b0
Show file tree
Hide file tree
Showing 4 changed files with 14 additions and 5 deletions.
8 changes: 8 additions & 0 deletions intel_extension_for_transformers/llm/runtime/graph/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,12 +37,20 @@ We support the following models:
## How to use

### 1. Build LLM Runtime
Linux
```shell
mkdir build
cd build
cmake .. -G Ninja
ninja
```
Windows: install VisualStudio 2022(a validated veresion), search 'Developer PowerShell for VS 2022' and open it, then run the following cmds.
```powershell
mkdir build
cd build
cmake ..
cmake --build . -j
```

### 2. Convert LLM
LLM Runtime assumes the same model format as [llama.cpp](https://github.com/ggerganov/llama.cpp) and [ggml](https://github.com/ggerganov/ggml). You can also convert the model by following the below steps:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
#include <random>
#include <regex>
#include <thread>
#include <functional>

#include "core/data_types.h"
#include "core/ne_layers.h"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1759,7 +1759,7 @@ void jblas_fusion_attn_fp32_fp16_fp16_fp32_forward(const attn_fp32_fp16_fp16_fp3
// return jblas_fusion_attn_forward_ref(*reinterpret_cast<const attn_fwd_args_t<float, fp16, fp16, float>*>(params));
}

bool blas_fusion_attn_fp16_support(const attn_shape_t* params) {
bool jblas_fusion_attn_fp16_support(const attn_shape_t* params) {
#if CompileFP16()
GetCPUDevice();
// TODO check K V's layout
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -144,10 +144,10 @@ void CHATGLM2::load(model_context& lctx, model_progress_callback progress_callba
layer.attn[2] = ml->get_tensor(layers_i + ".self_attention.dense.weight", {n_embd, n_embd}, backend);

// ffn GEMM
layer.ffn[0] =
ml->get_tensor(layers_i + ".mlp.dense_h_to_4h.weight", {n_embd, model.hparams.ffn_hidden_size * 2}, backend);
layer.ffn[1] =
ml->get_tensor(layers_i + ".mlp.dense_4h_to_h.weight", {model.hparams.ffn_hidden_size, n_embd}, backend);
layer.ffn[0] = ml->get_tensor(layers_i + ".mlp.dense_h_to_4h.weight",
{n_embd, uint32_t(model.hparams.ffn_hidden_size * 2)}, backend);
layer.ffn[1] = ml->get_tensor(layers_i + ".mlp.dense_4h_to_h.weight",
{uint32_t(model.hparams.ffn_hidden_size), n_embd}, backend);

layer.k_cache = d_ne_new_tensor_3d(model.ctx, NE_TYPE_F16, 4096 / 32, 32768, 2);
layer.v_cache = d_ne_new_tensor_3d(model.ctx, NE_TYPE_F16, 32768, 4096 / 32, 2);
Expand Down

0 comments on commit bffa1b0

Please sign in to comment.