Skip to content

v0.13.1

Compare
Choose a tag to compare
@github-actions github-actions released this 10 Jul 01:21
· 533 commits to main since this release

⚠️ Notice

🧰 Fixed and Improvements

  • Bump llama.cpp version to b3334, supporting Deepseek V2 series models.
  • Turn on fast attention for Qwen2-1.5B model to fix the quantization error.
  • Properly set number of GPU layers (to zero) when device is CPU.