24 Feb 12:27

v0.3.1

Features

server: allocate full attention mask to decrease latency
server: enable hf-transfer for insane download speeds
router: add CORS options

Fix

server: remove position_ids from galactica forward

Assets 2

16 Feb 16:33

v0.3.0

Features

server: support t5 models
router: add max_total_tokens and empty_input validation
launcher: add the possibility to disable custom CUDA kernels
server: add automatic safetensors conversion
router: add prometheus scrape endpoint
server, router: add distributed tracing

Fix

launcher: copy current env vars to subprocesses
docker: add note around shared memory

Assets 2

07 Feb 14:41

v0.2.1

Fix

server: fix bug with repetition penalty when using GPUs and inference mode

Assets 2

03 Feb 11:56

v0.2.0

Features

router: support Token streaming using Server Side Events
router: support seeding
server: support gpt-neox
server: support santacoder
server: support repetition penalty
server: allow the server to use a local weight cache

Breaking changes

router: refactor Token API
router: modify /generate API to only return generated text

Misc

router: use background task to manage request queue
ci: docker build/push on update

Assets 2