Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update TensorRT-LLM #2562

Merged
merged 1 commit into from
Dec 11, 2024
Merged

Update TensorRT-LLM #2562

merged 1 commit into from
Dec 11, 2024

Conversation

kaiyux
Copy link
Member

@kaiyux kaiyux commented Dec 11, 2024

  • Features
    • The LLM API
      • Added lookahead decoding support.
      • Added DeepSeek V1 support.
      • Added Medusa support.
    • Added support for LogN scaling for Qwen models.
    • Added quantization support for RecurrentGemma. Refer to examples/recurrentgemma/README.md.
    • Added AutoAWQ checkpoints support for Qwen. Refer to the “INT4-AWQ” section in examples/qwen/README.md.
  • API
    • [BREAKING CHANGE] Chunked context is enabled by default when KV cache and paged context FMHA is enabled on non-RNN based models.
    • [BREAKING CHANGE] Enable embedding sharing automatically when possible and remove the flag --use_embedding_sharing from convert checkpoints scripts.
  • Bug fixes
  • Infra
    • The base Docker image for TensorRT-LLM is updated to nvcr.io/nvidia/pytorch:24.11-py3.
    • The base Docker image for TensorRT-LLM Backend is updated to nvcr.io/nvidia/tritonserver:24.11-py3.
    • The dependent TensorRT version is updated to 10.7.
    • The dependent CUDA version is updated to 12.6.3.
    • Starting from the latest release, TensorRT-LLM Python wheels available on PyPI support both Python 3.10 and Python 3.12.
  • Known Issues
    • Windows build is broken and the team is working on it.

@kaiyux kaiyux merged commit aaacc9b into main Dec 11, 2024
@kaiyux kaiyux deleted the preview/main branch December 11, 2024 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants