Skip to content

Releases: huggingface/text-generation-inference

v0.3.1

24 Feb 12:27
4b1c972
Compare
Choose a tag to compare

Features

  • server: allocate full attention mask to decrease latency
  • server: enable hf-transfer for insane download speeds
  • router: add CORS options

Fix

  • server: remove position_ids from galactica forward

v0.3.0

16 Feb 16:33
c720555
Compare
Choose a tag to compare

Features

  • server: support t5 models
  • router: add max_total_tokens and empty_input validation
  • launcher: add the possibility to disable custom CUDA kernels
  • server: add automatic safetensors conversion
  • router: add prometheus scrape endpoint
  • server, router: add distributed tracing

Fix

  • launcher: copy current env vars to subprocesses
  • docker: add note around shared memory

v0.2.1

07 Feb 14:41
2fe5e1b
Compare
Choose a tag to compare

Fix

  • server: fix bug with repetition penalty when using GPUs and inference mode

v0.2.0

03 Feb 11:56
20c3c59
Compare
Choose a tag to compare

Features

  • router: support Token streaming using Server Side Events
  • router: support seeding
  • server: support gpt-neox
  • server: support santacoder
  • server: support repetition penalty
  • server: allow the server to use a local weight cache

Breaking changes

  • router: refactor Token API
  • router: modify /generate API to only return generated text

Misc

  • router: use background task to manage request queue
  • ci: docker build/push on update