Add new backends & support multi-modal LM #142

ys-2020 · 2024-02-23T01:55:51Z

We add new backend kernels (W4A16 GEMM / GEMV) for better inference speed. We also support multi-modal models (VILA and LLaVA) in this PR.

Note: We changed the weight packing format in this PR. Please re-quantize the model or use tinychat/offline-weight-repacker.py to repack the old awq checkpoints.

kentang-mit

I've reviewed this PR in our internal codebase. I will approve it.

w32zhong · 2024-08-16T23:25:11Z

@ys-2020 how much inference speed improvements for batch=1 and GEMV by introducing this new kernel?

[MAJOR] add new backend kernels & support multi-modal models

5f06dbb

kentang-mit approved these changes Feb 23, 2024

View reviewed changes

ys-2020 added 2 commits February 22, 2024 21:08

[Minor] fix link typo

2c8e093

[Minor] add README for gradio demo.

54b60b5

kentang-mit merged commit 1b72fbf into main Feb 23, 2024

ys-2020 mentioned this pull request Feb 23, 2024

Integrate & optimize LLaVA in TinyChat #135

Closed

casper-hansen mentioned this pull request Feb 24, 2024

New optimized kernels casper-hansen/AutoAWQ#365

Merged

kentang-mit mentioned this pull request Mar 2, 2024

RuntimeError: probability tensor contains either inf, nan or element < 0 when running LLaVA demo #147

Open

dacorvo mentioned this pull request Mar 6, 2024

Add CUDA kernels for Wint4Afloat16 huggingface/optimum-quanto#111

Closed

lzhangzz mentioned this pull request Mar 19, 2024

[Feature] Add new AWQ kernels InternLM/lmdeploy#1301

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new backends & support multi-modal LM #142

Add new backends & support multi-modal LM #142

ys-2020 commented Feb 23, 2024

kentang-mit left a comment

w32zhong commented Aug 16, 2024

Add new backends & support multi-modal LM #142

Add new backends & support multi-modal LM #142

Conversation

ys-2020 commented Feb 23, 2024

kentang-mit left a comment

Choose a reason for hiding this comment

w32zhong commented Aug 16, 2024