Skip to content

experiments with inference on llama

Notifications You must be signed in to change notification settings

ivanbaldo/llama-inference

 
 

Repository files navigation

llama inference

Exploration of latency on various setups of inference with llama.

Caveats

  • I didn't explore throughput. That is a deep rabbit hole - I was just exploring latency for a single request. You can tradeoff throughput and latency with various forms of batching requests.
  • I tried my best to use tools based on the documentation provided.

About

experiments with inference on llama

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 51.3%
  • Jupyter Notebook 47.2%
  • Shell 1.5%