Skip to content

v0.1.8

Compare
Choose a tag to compare
@jjleng jjleng released this 19 May 08:13
· 29 commits to main since this release

What's Changed

  • feat: support inference with spot instances by @jjleng in #86
  • feat: support mixed model groups by @jjleng in #87
  • feat: flag to make model groups not exposed to the public by @jjleng in #88
  • feat: heuristics for caculating the memory, cpu and gpu resource requests by @jjleng in #89
  • fix: typing issues in e2e test code by @jjleng in #90
  • feat: command to list function revisions by @jjleng in #91
  • feat: command to split traffic among revisions by @jjleng in #92
  • refactor: make typer command args more consistent by @jjleng in #93
  • refactor: save kubeconfig as soon as cluster is provisioned by @jjleng in #94
  • feat: options to pass resource requests and limits when creating fuctions by @jjleng in #95
  • feat: utilize multiple GPUs for vllm inferrence by @jjleng in #96
  • feat: utility to parse gguf metadata by @jjleng in #97
  • fix: permissions to allow cluster autoscaler scale from 0 by @jjleng in #98
  • chore: bump up version by @jjleng in #99
  • chore: add homepage info to pyproject by @jjleng in #100

Full Changelog: v0.1.7...v0.1.8