You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When attempting to run the LoRA on GPU results in nothing happening.
This occurs for my non-wsl, wsl, and seperate Linux boot on my machine.
It does however generate and work properly when using the --cpu option.
Also to note I had to replace the bitsandbytes_cpu.so with bitsandbytes_cuda117.so to function, if its in any way related.
Is there an existing issue for this?
I have searched the existing issues
Reproduction
run the command-line: python server.py --listen --load-in-8bit, select 7b weights, go to parameters, select alpaca-lora-7b, using default prompt and parameters select generate, nothing happens other than a log with 0 tokens and a transformers warning.
Screenshot
No response
Logs
Output generated in 0.23 seconds (0.00 tokens/s, 0 tokens)
C:\Users\Arargd\miniconda3\envs\textgen\lib\site-packages\transformers\generation\utils.py:1374: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on cuda, whereas the model is on cpu. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('cpu') before running `.generate()`.
System Info
Windows 10 (both WSL and non WSL)
Linux Ubuntu 22.04.2
RTX 2070 8GB
32 GB 3600 MHz DDR4 RAM
The text was updated successfully, but these errors were encountered:
I have commented in that issue, but yes that works for getting me to generate normally without LoRA. With LoRA attached it refuses to generate anything on GPU, however.
Describe the bug
When attempting to run the LoRA on GPU results in nothing happening.
This occurs for my non-wsl, wsl, and seperate Linux boot on my machine.
It does however generate and work properly when using the --cpu option.
Also to note I had to replace the bitsandbytes_cpu.so with bitsandbytes_cuda117.so to function, if its in any way related.
Is there an existing issue for this?
Reproduction
run the command-line: python server.py --listen --load-in-8bit, select 7b weights, go to parameters, select alpaca-lora-7b, using default prompt and parameters select generate, nothing happens other than a log with 0 tokens and a transformers warning.
Screenshot
No response
Logs
System Info
The text was updated successfully, but these errors were encountered: