Are there any plans to integrate the inference of the canary model with tensorrt llm? #8899

systemdevart · 2024-04-12T09:15:57Z

systemdevart
Apr 12, 2024

It would be nice to have an inference example of the canary model with Triton, and maybe it is an optimized version with an efficient backend implementation such as Tensorrt-LLM.

nithinraok · 2024-04-22T13:47:00Z

nithinraok
Apr 22, 2024
Maintainer

cc: @galv

0 replies

galv · 2024-04-23T15:58:17Z

galv
Apr 23, 2024
Collaborator

We are working on Parakeet in Triton Inference Server using the python backend right now, but not Canary: https://github.com/NVIDIA/NeMo/pull/8673/files#diff-11846c15f5c57c285b422afe55fafd8d6daedc60fa814371c81a73da0c39aa22

I haven't posted any results, but the throughputs being achieved are very close to the throughputs achieved by running transcribe_speech.py (like 1300 RTFx on an A100 on librispeech test other at batch size 32).

You may note that there is a Whisper implementation in TensorRT-LLM. The TensorRT-LLM team is working on continuous batching, for encoder-decoder style models like Whisper, which is frankly required for optimal throughput in my view. It isn't done yet. (Continuous batching meanwhile is less important for Parakeet models, because the output sequence length distribution has less variance). The central problem is that you need a large batch size to fully saturate your GPU, but at large batch sizes, you will have a higher variance in output sequence lengths, which causes wasted computation in non-continuous-batching implementations.

Anyway, the point is that this work on Whisper will be carried over to Canary as well, because they are the same style architecture, but I cannot give you any timeline on that.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are there any plans to integrate the inference of the canary model with tensorrt llm? #8899

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Are there any plans to integrate the inference of the canary model with tensorrt llm? #8899

systemdevart Apr 12, 2024

Replies: 2 comments

nithinraok Apr 22, 2024 Maintainer

galv Apr 23, 2024 Collaborator

systemdevart
Apr 12, 2024

nithinraok
Apr 22, 2024
Maintainer

galv
Apr 23, 2024
Collaborator