You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be nice to be able to disable mmap on models to increase inference speed. I am only guessing that mmap is to blame since my memory stays really low when loading a large llama based model.
I am getting low inference speeds (and low memory) when loading large llama based models such as llama-30b.ggmlv3.q5_K_M. The ability to disable mmap could help improve this.
Your contribution
None.
The text was updated successfully, but these errors were encountered:
Looks like my low inference speed was due to exceeding my RAM limit and going to swap. I didn't realize at the time due to the OS not reporting it as RAM usage, closing as I no longer see a reason for someone to disable mmap.
Feature request
It would be nice to be able to disable mmap on models to increase inference speed. I am only guessing that mmap is to blame since my memory stays really low when loading a large llama based model.
Related issue in llama.cpp
Motivation
I am getting low inference speeds (and low memory) when loading large llama based models such as llama-30b.ggmlv3.q5_K_M. The ability to disable mmap could help improve this.
Your contribution
None.
The text was updated successfully, but these errors were encountered: