Releases: huggingface/text-generation-inference
Releases · huggingface/text-generation-inference
v0.3.1
Features
- server: allocate full attention mask to decrease latency
- server: enable hf-transfer for insane download speeds
- router: add CORS options
Fix
- server: remove position_ids from galactica forward
v0.3.0
Features
- server: support t5 models
- router: add max_total_tokens and empty_input validation
- launcher: add the possibility to disable custom CUDA kernels
- server: add automatic safetensors conversion
- router: add prometheus scrape endpoint
- server, router: add distributed tracing
Fix
- launcher: copy current env vars to subprocesses
- docker: add note around shared memory
v0.2.1
Fix
- server: fix bug with repetition penalty when using GPUs and inference mode
v0.2.0
Features
- router: support Token streaming using Server Side Events
- router: support seeding
- server: support gpt-neox
- server: support santacoder
- server: support repetition penalty
- server: allow the server to use a local weight cache
Breaking changes
- router: refactor Token API
- router: modify /generate API to only return generated text
Misc
- router: use background task to manage request queue
- ci: docker build/push on update