v0.1.8
What's Changed
- feat: support inference with spot instances by @jjleng in #86
- feat: support mixed model groups by @jjleng in #87
- feat: flag to make model groups not exposed to the public by @jjleng in #88
- feat: heuristics for caculating the memory, cpu and gpu resource requests by @jjleng in #89
- fix: typing issues in e2e test code by @jjleng in #90
- feat: command to list function revisions by @jjleng in #91
- feat: command to split traffic among revisions by @jjleng in #92
- refactor: make typer command args more consistent by @jjleng in #93
- refactor: save kubeconfig as soon as cluster is provisioned by @jjleng in #94
- feat: options to pass resource requests and limits when creating fuctions by @jjleng in #95
- feat: utilize multiple GPUs for vllm inferrence by @jjleng in #96
- feat: utility to parse gguf metadata by @jjleng in #97
- fix: permissions to allow cluster autoscaler scale from 0 by @jjleng in #98
- chore: bump up version by @jjleng in #99
- chore: add homepage info to pyproject by @jjleng in #100
Full Changelog: v0.1.7...v0.1.8