Skip to content

Actions: huggingface/text-generation-inference

Server Tests

Actions

Loading...
Loading

Show workflow options

Create status badge

Loading
2,406 workflow runs
2,406 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

Support qwen2 vl
Server Tests #3283: Pull request #2689 opened by drbh
October 24, 2024 20:02 8m 13s support-qwen2-vl
October 24, 2024 20:02 8m 13s
feat: add triton kernels to decrease latency of large batches
Server Tests #3282: Pull request #2687 synchronize by OlivierDehaene
October 24, 2024 17:17 6m 37s feat/triton_prepare
October 24, 2024 17:17 6m 37s
feat: add triton kernels to decrease latency of large batches
Server Tests #3281: Pull request #2687 synchronize by OlivierDehaene
October 24, 2024 17:01 8m 53s feat/triton_prepare
October 24, 2024 17:01 8m 53s
Switch from fbgemm-gpu w8a8 scaled matmul to vLLM/marlin-kernels
Server Tests #3280: Pull request #2688 synchronize by danieldk
October 24, 2024 15:32 7m 31s feature/cc89-cutlass-w8a8
October 24, 2024 15:32 7m 31s
feat: add triton kernels to decrease latency of large batches
Server Tests #3278: Pull request #2687 opened by OlivierDehaene
October 24, 2024 14:49 7m 2s feat/triton_prepare
October 24, 2024 14:49 7m 2s
fix: improve find_segments via numpy diff
Server Tests #3277: Pull request #2686 opened by drbh
October 24, 2024 14:20 8m 49s improve-find-segments-function
October 24, 2024 14:20 8m 49s
Add support for FP8 KV cache scales
Server Tests #3275: Pull request #2628 synchronize by danieldk
October 24, 2024 12:37 9m 11s feature/fp8-kv-cache-scale
October 24, 2024 12:37 9m 11s
Choosing input/total tokens automatically based on available VRAM?
Server Tests #3274: Pull request #2673 synchronize by Narsil
October 24, 2024 09:54 6m 40s auto_length
October 24, 2024 09:54 6m 40s
Choosing input/total tokens automatically based on available VRAM?
Server Tests #3273: Pull request #2673 synchronize by Narsil
October 24, 2024 09:39 9m 8s auto_length
October 24, 2024 09:39 9m 8s
Add support for FP8 KV cache scales
Server Tests #3272: Pull request #2628 synchronize by danieldk
October 24, 2024 08:50 8m 52s feature/fp8-kv-cache-scale
October 24, 2024 08:50 8m 52s
Choosing input/total tokens automatically based on available VRAM?
Server Tests #3271: Pull request #2673 synchronize by Narsil
October 24, 2024 08:07 7m 16s auto_length
October 24, 2024 08:07 7m 16s
Choosing input/total tokens automatically based on available VRAM?
Server Tests #3270: Pull request #2673 synchronize by Narsil
October 24, 2024 07:58 9m 12s auto_length
October 24, 2024 07:58 9m 12s
Choosing input/total tokens automatically based on available VRAM?
Server Tests #3269: Pull request #2673 synchronize by Narsil
October 24, 2024 04:55 9m 1s auto_length
October 24, 2024 04:55 9m 1s
feat: allow any supported payload on /invocations
Server Tests #3268: Pull request #2683 synchronize by OlivierDehaene
October 23, 2024 10:20 7m 2s feat/aws_invocations
October 23, 2024 10:20 7m 2s
feat: allow any supported payload on /invocations
Server Tests #3267: Pull request #2683 opened by OlivierDehaene
October 23, 2024 10:04 8m 49s feat/aws_invocations
October 23, 2024 10:04 8m 49s
feat: natively support Granite models
Server Tests #3266: Pull request #2682 synchronize by OlivierDehaene
October 23, 2024 10:03 8m 43s feat/granite
October 23, 2024 10:03 8m 43s
Choosing input/total tokens automatically based on available VRAM?
Server Tests #3265: Pull request #2673 synchronize by Narsil
October 23, 2024 10:03 9m 20s auto_length
October 23, 2024 10:03 9m 20s
Choosing input/total tokens automatically based on available VRAM?
Server Tests #3264: Pull request #2673 synchronize by Narsil
October 23, 2024 09:26 7m 26s auto_length
October 23, 2024 09:26 7m 26s
feat: natively support Granite models
Server Tests #3263: Pull request #2682 opened by OlivierDehaene
October 23, 2024 09:10 9m 44s feat/granite
October 23, 2024 09:10 9m 44s
Choosing input/total tokens automatically based on available VRAM?
Server Tests #3262: Pull request #2673 synchronize by Narsil
October 23, 2024 07:24 7m 16s auto_length
October 23, 2024 07:24 7m 16s
Choosing input/total tokens automatically based on available VRAM?
Server Tests #3261: Pull request #2673 synchronize by Narsil
October 23, 2024 07:03 9m 39s auto_length
October 23, 2024 07:03 9m 39s
Add support for stop words in TRTLLM
Server Tests #3259: Pull request #2678 synchronize by mfuntowicz
October 22, 2024 21:06 8m 42s trtllm-stop-words
October 22, 2024 21:06 8m 42s