-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
convert-hf-to-gguf.py Qwen-72B-Chat model get Killed result #5156
Comments
I assume you have run out of memory. How much RAM do you have? |
My RAM is 64Gb,cpu32 |
The process is likely to be killed because of low memory: https://stackoverflow.com/questions/726690/what-killed-my-process-and-why |
It seems like a bug, and the offending line is llama.cpp/convert-hf-to-gguf.py Line 999 in a1d6df1
Which makes the script load the whole (80GB+) model into memory, instead of using @lmxin123 Could you try the script with the changes below? diff --git a/convert-hf-to-gguf.py b/convert-hf-to-gguf.py
index 7a0a8c3d..8cef8429 100755
--- a/convert-hf-to-gguf.py
+++ b/convert-hf-to-gguf.py
@@ -996,9 +996,8 @@ class QwenModel(Model):
def write_tensors(self):
block_count = self.hparams["num_hidden_layers"]
- model_kv = dict(self.get_tensors())
tensor_map = gguf.get_tensor_name_map(self.model_arch, block_count)
- for name, data_torch in model_kv.items():
+ for name, data_torch in self.get_tensors():
# we don't need these
if name.endswith(".rotary_emb.inv_freq"):
continue |
Thank you for your response, but unfortunately, I don't know Python, and I'm unable to test your modifications. I look forward to an official update of the version. |
Having the same issue with converting falcon-40b on a machine with 24GB RAM. Process gets killed most likely due to lack of memory. I applied the patch from your response but it didn't help unfortunately. |
@lmxin123 Could you try with this script? https://gist.github.com/Galunid/c169dd4078c9cb11e8d8a4a8888eab2b |
@timopb Falcon is a separate issue and the above is not applicable. |
I'm having the same problem even with the new script by @Galunid It's not just loading the original model into ram, it's also writing the new model to ram first, instead of disk. |
This issue is stale because it has been open for 30 days with no activity. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
This issue is likely caused by an Out of Memory (OOM) error. You can prevent OOM by utilizing virtual memory and creating a swap file to allocate additional resources. Here's how you can establish a swap file and apply it to your virtual memory:
With the swap file in place, you should be able to transform your model without encountering an OOM error. After you have successfully transformed your model, you can disable the swap file from the virtual memory and delete it to free up space. Here's how:
By following these steps, you can effectively manage your system's memory resources and prevent OOM errors during model transformations. btw, Good luck~
|
I use python convert-hf-to-gguf.py /Qwen-72B-Chat.
And I am getting the same error:
blk.33.ffn_down.weight, n_dims = 2, torch.bfloat16 --> float16 blk.33.ffn_up.weight, n_dims = 2, torch.bfloat16 --> float16 blk.33.ffn_gate.weight, n_dims = 2, torch.bfloat16 --> float16 blk.34.attn_qkv.bias, n_dims = 1, torch.bfloat16 --> float32 blk.34.attn_qkv.weight, n_dims = 2, torch.bfloat16 --> float16 blk.34.attn_output.weight, n_dims = 2, torch.bfloat16 --> float16 blk.34.attn_norm.weight, n_dims = 1, torch.bfloat16 --> float32 blk.34.ffn_norm.weight, n_dims = 1, torch.bfloat16 --> float32 blk.34.ffn_up.weight, n_dims = 2, torch.bfloat16 --> float16 blk.34.ffn_down.weight, n_dims = 2, torch.bfloat16 --> float16 blk.34.ffn_gate.weight, n_dims = 2, torch.bfloat16 --> float16 blk.35.attn_qkv.bias, n_dims = 1, torch.bfloat16 --> float32 blk.35.attn_qkv.weight, n_dims = 2, torch.bfloat16 --> float16 blk.35.attn_output.weight, n_dims = 2, torch.bfloat16 --> float16 blk.35.attn_norm.weight, n_dims = 1, torch.bfloat16 --> float32 blk.35.ffn_norm.weight, n_dims = 1, torch.bfloat16 --> float32 Killed
What does this mean “Killed”?
@ggerganov @slaren @prusnak
The text was updated successfully, but these errors were encountered: