TRT-LLM export P-tuning related fixes #8863

apanteleev · 2024-04-09T21:04:55Z

What does this PR do ?

Fixes the implementation of PEFT for TRT-LLM export.

Collection: NLP

Changelog

Add the bos token to LLAMA based models (not just P-tuning related, improves LLAMA results in general).
Remember the vtoken counts for each p-tuning table when the tables are added.
Prepend the right number of vtokens to each query based on its task_id.
Preserve the dtype of the p-tuning table when it is padded.
Validate that all p-tuning tables fit into max_prompt_embedding_table_size limit.

PR Type:

New Feature
Bugfix
Documentation

Additional Information

Related to bug 4350064

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

- Remember the vtoken counts for each p-tuning table when the tables are added; - Prepend the right number of vtokens to each query based on its task_id; - Preserve the dtype of the p-tuning table when it is padded; - Validate that all p-tuning tables fit into max_prompt_embedding_table_size limit. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

for more information, see https://pre-commit.ci

* Fixed the uses of pathlib.Path. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Add the bos token to LLAMA based models. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * P-tuning related fixes: - Remember the vtoken counts for each p-tuning table when the tables are added; - Prepend the right number of vtokens to each query based on its task_id; - Preserve the dtype of the p-tuning table when it is padded; - Validate that all p-tuning tables fit into max_prompt_embedding_table_size limit. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com> Signed-off-by: Ao Tang <aot@nvidia.com>

* Fixed the uses of pathlib.Path. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * Add the bos token to LLAMA based models. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * P-tuning related fixes: - Remember the vtoken counts for each p-tuning table when the tables are added; - Prepend the right number of vtokens to each query based on its task_id; - Preserve the dtype of the p-tuning table when it is padded; - Validate that all p-tuning tables fit into max_prompt_embedding_table_size limit. Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com>

apanteleev force-pushed the ptuning branch from ae8d0c3 to 5119eae Compare April 9, 2024 21:08

github-actions bot added core Changes to NeMo Core NLP CI Multi Modal labels Apr 9, 2024

oyilmaz-nvidia self-requested a review April 9, 2024 21:08

apanteleev added 3 commits April 9, 2024 14:09

Fixed the uses of pathlib.Path.

0c94830

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

Add the bos token to LLAMA based models.

ca4a1dc

Signed-off-by: Alexey Panteleev <alpanteleev@nvidia.com>

apanteleev force-pushed the ptuning branch from 3790cd9 to 227426f Compare April 9, 2024 21:10

github-actions bot removed core Changes to NeMo Core NLP CI Multi Modal labels Apr 9, 2024

pre-commit-ci bot and others added 3 commits April 9, 2024 21:11

[pre-commit.ci] auto fixes from pre-commit.com hooks

07ce7cd

for more information, see https://pre-commit.ci

Merge branch 'main' into ptuning

df5a2eb

Merge branch 'main' into ptuning

ab6f7e8

oyilmaz-nvidia approved these changes May 2, 2024

View reviewed changes

oyilmaz-nvidia merged commit d66ca99 into NVIDIA:main May 2, 2024
125 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TRT-LLM export P-tuning related fixes #8863

TRT-LLM export P-tuning related fixes #8863

apanteleev commented Apr 9, 2024

TRT-LLM export P-tuning related fixes #8863

TRT-LLM export P-tuning related fixes #8863

Conversation

apanteleev commented Apr 9, 2024

What does this PR do ?

Changelog

Additional Information