llamafile v0.2.1

jart released this 01 Dec 18:51

· 633 commits to main since this release

57cc1f4

llamafile lets you distribute and run LLMs with a single file. See our README file for documentation and to learn more.

Changes

95703b6 Fix support for old Intel CPUs
401dd08 Add OpenAI API compatibility to server
e5c2315 Make server open tab in browser on startup
865462f Cherry pick StableLM support from llama.cpp
8f21460 Introduce pledge() / seccomp security to llama.cpp
711344b Fix server so it doesn't consume 100% cpu when idle
12f4319 Add single-client multi-prompt support to server
c64989a Add --log-disable flag to server
90fa20f Fix typical sampling (#4261)
e574488 reserve space in decode_utf8
481b6a5 Look for GGML DSO before looking for NVCC
41f243e Check for i/o errors in httplib read_file()
ed87fdb Fix uninitialized variables in server
c5d35b0 Avoid CUDA assertion error with some models
c373b5d Fix LLaVA regression for square images
176e54f Fix server crash when prompt exceeds context size

Example Llamafiles

Our .llamafiles on Hugging Face have been updated to incorporate these new release binaries. You can redownload here:

If you have a slower Internet connection and don't want to re-download, then you don't have to! Instructions are here:

#24 (comment)

Assets 9