cyllama: a thin cython wrapper around llama.cpp #10650
shakfu
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
H folks,
Ok, this is my show and tell 😄
In case anyone's interested, I've been working for some time on the open-source cyllama project, a thin cython wrapper for llama.cpp. It was spun-off from an earlier, now frozen, llama.cpp wrapper project, llamalib which provided early stage, but functional, wrappers using cython, pybind11, and nanobind.
In cyllama,
libllama.a
,libggml.a
, and other related static libs are statically linked to this python extension for simplicity and performance: as a wheel it's around 1.2 MB. It can perform basic inference via a high-level and lower-level interface wrappingllama.h
and parts ofcommon.h
and others as necessary. It generally tries to keep up with the latest changes in llama.cpp while maintaining some kind of stability in terms of all tests passing and error-free compilation in between updates.Development goals are to:
Stay up-to-date with bleeding-edge llama.cpp.
Produce a minimal, performant, compiled, thin python wrapper around the core llama-cli feature-set of llama.cpp.
Integrate and wrap llava-cli features.
Integrate and wrap features from related projects such as whisper.cpp and stable-diffusion.cpp
Learn about the internals of this popular C++/C LLM inference engine along the way. This is definitely the most efficient way, for me at least, to learn about the underlying technologies.
If you try it, please provide feedback, ask questions, post-bugs, etc., -- any contributions are welcome!
Beta Was this translation helpful? Give feedback.
All reactions