Update README.md

Signed-off-by: Haihao Shen <haihao.shen@intel.com>
intel · Jan 26, 2024 · b8ee438 · b8ee438
1 parent c4a8c5e
commit b8ee438
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/README.md b/README.md
@@ -247,7 +247,7 @@ model = ipex.optimize_transformers(model, inplace=True, dtype=torch.float16, woq
 
 output = model.generate(inputs)
 ```
-> Note: Please refer to [gpu example](https://github.com/intel/intel-extension-for-transformers/blob/main/docs/weightonlyquant.md#examples-for-gpu) and [gpu script](https://github.com/intel/intel-extension-for-transformers/blob/main/examples/huggingface/pytorch/text-generation/quantization/run_generation_gpu_woq.py). Known issue: If your device memory is not enough, please save the model and load again with the code in [gpu example](https://github.com/intel/intel-extension-for-transformers/blob/main/docs/weightonlyquant.md#examples-for-gpu)
+> Note: Please refer to the [example](https://github.com/intel/intel-extension-for-transformers/blob/main/docs/weightonlyquant.md#examples-for-gpu) and [script](https://github.com/intel/intel-extension-for-transformers/blob/main/examples/huggingface/pytorch/text-generation/quantization/run_generation_gpu_woq.py) for more details.
 
 ### Langchain-based extension APIs
 Below is the sample code to use the extended Langchain APIs. See more [examples](intel_extension_for_transformers/neural_chat/pipeline/plugins/retrieval/README.md).