You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Turns out performance of lammacpp batched generation on CUDA devices is pretty awful, top speed-up is ~2x on an A6000. Not going to bother for so little gain.
Batch generation has landed: ggerganov/llama.cpp#3228
This should make our test suite ~10x faster on GGUF models.
The text was updated successfully, but these errors were encountered: