[Feature] 支持使用accelerate和bitsandbytes #347

hechuanlong · 2023-04-02T15:01:55Z

Is your feature request related to a problem? Please describe.

在Windows上遇到了“Symbol cudaLaunchKernel not found，...，RuntimeError: Library cublasLt is not initialized”问题，搜索了一下很多人都遇到了同样的问题，但没有找到明确的解决方法。根据“Symbol cudaLaunchKernel not found”信息怀疑可能跟cuda、torch版本有关。但我不想更新，因为这个环境跑 https://github.com/tloen/alpaca-lora 是好好的。

于是尝试参考alpaca-lora改了一下，发现可以跑起来，并且速度还比较快，所以发出来和大家分享一下，看是否有参考，进一步优化。

1.修改cli_demo.py，web_demo.py应该也一样：

diff --git a/cli_demo.py b/cli_demo.py
index da80fff..97613f5 100644
--- a/cli_demo.py
+++ b/cli_demo.py
@@ -1,10 +1,15 @@
 import os
 import platform
 import signal
+import torch
 from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
-model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
+#model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
+model = AutoModel.from_pretrained("THUDM/chatglm-6b", load_in_8bit=True, trust_remote_code=True,
+        torch_dtype=torch.float16,
+        device_map="auto",)
 model = model.eval()

2.安装accelerate和bitsandbytes

pip install accelerate
pip install bitsandbytes

3.这时候bitsandbytes可能还用不了，在Windows会报错，需要参考这里的5、6改一下：
oobabooga/text-generation-webui#147 (comment)

最后执行“python cli_demo.py”就可以运行了。

观察到默认的实现是会先将所有数据加载到内存，内存占用很快到达16G（后面就报错了）。
而经过上面的修改后，数据是逐步加载到显存。内存一直维持在4G左右，在加载完成后，显存占用是9G左右，加载速度也比原来的快。

Solutions

是否可以支持accelerate和bitsandbytes？

Additional context

测试环境：Win11+RTX3060+CUDA Version: 11.6+torch 1.13.1

The text was updated successfully, but these errors were encountered:

YXHXianYu · 2023-04-05T05:34:33Z

成功了，可以正常运行。

但是有个问题：使用bitsandbytes之后，貌似无法使用本仓库提供的ptuning示例代码进行微调。

微调时，训练部分可以正常训练（也根据如上方法修改训练部分的main.py），但推理部分无法运行，会提示如下错误：

Traceback (most recent call last):
File "D:\Codes\AI\ChatGLM\ChatGLM-6B\ptuning\main.py", line 403, in
main()
File "D:\Codes\AI\ChatGLM\ChatGLM-6B\ptuning\main.py", line 323, in main
trainer = Seq2SeqTrainer(
File "D:\Codes\AI\ChatGLM\ChatGLM-6B\venv\lib\site-packages\transformers\trainer.py", line 362, in init
raise ValueError(
ValueError: The model you want to train is loaded in 8-bit precision. Training an 8-bit model is not supported yet.

根据最后一行的异常，好像是因为ptuning部分的某个接口不支持bitsandbytes的8-bit精度模型？

但因为我是使用8bit量化进行微调的，所以不确定这里的异常是bitsandbytes的锅，还是ptuning参数的锅。（但本仓库ptuning的readme里并没有写不支持8bit量化，所以感觉可能是bitsandbytes的问题）

完整错误信息见附件
error_log.txt

hechuanlong · 2023-04-05T07:49:43Z

成功了，可以正常运行。

但是有个问题：使用bitsandbytes之后，貌似无法使用本仓库提供的ptuning示例代码进行微调。

微调时，训练部分可以正常训练（也根据如上方法修改训练部分的main.py），但推理部分无法运行，会提示如下错误：

Traceback (most recent call last):
File "D:\Codes\AI\ChatGLM\ChatGLM-6B\ptuning\main.py", line 403, in
main()
File "D:\Codes\AI\ChatGLM\ChatGLM-6B\ptuning\main.py", line 323, in main
trainer = Seq2SeqTrainer(
File "D:\Codes\AI\ChatGLM\ChatGLM-6B\venv\lib\site-packages\transformers\trainer.py", line 362, in init
raise ValueError(
ValueError: The model you want to train is loaded in 8-bit precision. Training an 8-bit model is not supported yet.

根据最后一行的异常，好像是因为ptuning部分的某个接口不支持bitsandbytes的8-bit精度模型？

但因为我是使用8bit量化进行微调的，所以不确定这里的异常是bitsandbytes的锅，还是ptuning参数的锅。（但本仓库ptuning的readme里并没有写不支持8bit量化，所以感觉可能是bitsandbytes的问题）

完整错误信息见附件 error_log.txt

还没试过微调，你可以尝试一下这个： https://github.com/mymusise/ChatGLM-Tuning

YXHXianYu · 2023-04-05T07:58:54Z

成功了，可以正常运行。
但是有个问题：使用bitsandbytes之后，貌似无法使用本仓库提供的ptuning示例代码进行微调。
微调时，训练部分可以正常训练（也根据如上方法修改训练部分的main.py），但推理部分无法运行，会提示如下错误：

Traceback (most recent call last):
File "D:\Codes\AI\ChatGLM\ChatGLM-6B\ptuning\main.py", line 403, in
main()
File "D:\Codes\AI\ChatGLM\ChatGLM-6B\ptuning\main.py", line 323, in main
trainer = Seq2SeqTrainer(
File "D:\Codes\AI\ChatGLM\ChatGLM-6B\venv\lib\site-packages\transformers\trainer.py", line 362, in init
raise ValueError(
ValueError: The model you want to train is loaded in 8-bit precision. Training an 8-bit model is not supported yet.

根据最后一行的异常，好像是因为ptuning部分的某个接口不支持bitsandbytes的8-bit精度模型？
但因为我是使用8bit量化进行微调的，所以不确定这里的异常是bitsandbytes的锅，还是ptuning参数的锅。（但本仓库ptuning的readme里并没有写不支持8bit量化，所以感觉可能是bitsandbytes的问题）
完整错误信息见附件 error_log.txt

还没试过微调，你可以尝试一下这个： https://github.com/mymusise/ChatGLM-Tuning

好的，我看看o(*￣▽￣*)o

dofish · 2023-04-13T12:25:46Z

8-bit weights are not supported on multiple GPUs. Revert to use one GPU.

nameless0704 mentioned this issue Apr 14, 2023

如何改成多卡推理？ chatchat-space/Langchain-Chatchat#77

Closed

runeq99 mentioned this issue May 8, 2023

RuntimeError: Library cublasLt is not initialized[BUG/Help] <title> #465

Closed

1 task

zhangch9 closed this as completed Aug 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] 支持使用accelerate和bitsandbytes #347

[Feature] 支持使用accelerate和bitsandbytes #347

hechuanlong commented Apr 2, 2023

YXHXianYu commented Apr 5, 2023

hechuanlong commented Apr 5, 2023

YXHXianYu commented Apr 5, 2023

dofish commented Apr 13, 2023

[Feature] 支持使用accelerate和bitsandbytes #347

[Feature] 支持使用accelerate和bitsandbytes #347

Comments

hechuanlong commented Apr 2, 2023

Is your feature request related to a problem? Please describe.

Solutions

Additional context

YXHXianYu commented Apr 5, 2023

hechuanlong commented Apr 5, 2023

YXHXianYu commented Apr 5, 2023

dofish commented Apr 13, 2023