llamafile v0.2.1
llamafile lets you distribute and run LLMs with a single file. See our README file for documentation and to learn more.
Changes
- 95703b6 Fix support for old Intel CPUs
- 401dd08 Add OpenAI API compatibility to server
- e5c2315 Make server open tab in browser on startup
- 865462f Cherry pick StableLM support from llama.cpp
- 8f21460 Introduce pledge() / seccomp security to llama.cpp
- 711344b Fix server so it doesn't consume 100% cpu when idle
- 12f4319 Add single-client multi-prompt support to server
- c64989a Add --log-disable flag to server
- 90fa20f Fix typical sampling (#4261)
- e574488
reserve
space indecode_utf8
- 481b6a5 Look for GGML DSO before looking for NVCC
- 41f243e Check for i/o errors in httplib read_file()
- ed87fdb Fix uninitialized variables in server
- c5d35b0 Avoid CUDA assertion error with some models
- c373b5d Fix LLaVA regression for square images
- 176e54f Fix server crash when prompt exceeds context size
Example Llamafiles
Our .llamafiles on Hugging Face have been updated to incorporate these new release binaries. You can redownload here:
- https://huggingface.co/jartine/llava-v1.5-7B-GGUF/tree/main
- https://huggingface.co/jartine/mistral-7b.llamafile/tree/main
- https://huggingface.co/jartine/wizardcoder-13b-python/tree/main
If you have a slower Internet connection and don't want to re-download, then you don't have to! Instructions are here: