#

tensorrt-llm

Here are 13 public repositories matching this topic...

Awesome-LLM-Inference

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉

mla vllm llm-inference awesome-llm flash-attention tensorrt-llm paged-attention deepseek flash-attention-3 deepseek-v3 minimax-01 deepseek-r1 flash-mla qwen3

Updated Jul 23, 2025
Python

collabora / WhisperLive

A nearly-live implementation of OpenAI's Whisper.

text-to-speech translation voice-recognition openai obs dictation whisper tensorrt openvino openvino-intel tensorrt-llm whisper-tensorrt

Updated Jul 24, 2025
Python

huggingface / optimum-benchmark

🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.

benchmark pytorch openvino onnxruntime text-generation-inference neural-compressor tensorrt-llm

Updated May 28, 2025
Python

NetEase-Media / grps_trtllm

Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+Tokenizers.cpp, supporting chat and function call, AI agents, distributed multi-GPU inference, multimodal capabilities, and a Gradio chat interface.

Updated May 14, 2025
Python

vossr / Chat-With-RTX-python-api

Chat With RTX Python API

tensorrt llm llm-inference tensorrt-llm mistral-7b llama2-13b chat-with-rtx nvidia-chat-with-rtx

Updated May 11, 2025
Python

fgblanch / OutlookLLM

Add-in for new Outlook that adds LLM new features (Composition, Summarizing, Q&A). It uses a local LLM via Nvidia TensorRT-LLM

outlook-addin tensorrt-llm

Updated Jun 5, 2025
Python

modal-labs / stopwatch

A tool for benchmarking LLMs on Modal

machine-learning llms vllm tensorrt-llm sglang

Updated Jul 24, 2025
Python

lix19937 / llm-deploy

AI Infra LLM infer/ tensorrt-llm/ vllm

llm llm-inference tensorrt-llm

Updated Dec 17, 2024
Python

zRzRzRzRzRzRzR / lm-fly

大模型推理框架加速，让 LLM 飞起来

mlx tgi openvino llm vllm llm-inference tensorrt-llm

Updated May 10, 2024
Python

wcks13589 / LLM-Tutorial

LLM tutorial materials include but not limited to NVIDIA NeMo, TensorRT-LLM, Triton Inference Server, and NeMo Guardrails.

nemo nvidia-nemo llm nemo-guardrails tensorrt-llm

Updated Jun 26, 2025
Python

ccyrene / flash_whisper

Whisper optimization for real-time application

go shell docker-compose python3 onnx triton-inference-server tensorrt-llm

Updated Mar 4, 2025
Python

MustaphaU / Simplify-Documentation-Review-on-Atlassian-Confluence-with-LLAMA2-and-NVIDIA-TensorRT-LLM

A simple project demonstrating LLM assisted review of documentation on Atlasssian Confluence.

quantization atlassian-confluence tensorrt streamlit tensorrt-inference tensorrt-engine llama-index llama2 tensorrt-llm

Updated Jun 14, 2025
Python

yui-mhcp / language_models

A Large Language Models (LLM) oriented project providing easy-to-use features like RAG, translation, summarization, ...

nlp agent translation keras question-answering summarization rag reformulation llm tensorrt-llm agentic-ai

Updated Jul 21, 2025
Python

Improve this page

Add a description, image, and links to the tensorrt-llm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the tensorrt-llm topic, visit your repo's landing page and select "manage topics."