Combining Inference and PEFT Tokens in a Batch #1153

jiazhihao · 2023-09-27T14:49:12Z

Description of changes:

This PR combines LLM inference and PEFT tokens in a batch to enable kernel-level optimizations.

TODOs (first release):

Fix fusion issue
Optimizers (PR Peft optimizer #1290)
Align cross entropy loss
Implement code to initialize a new PEFT from scratch (instead of continuing the finetuning of existing ones)
Support upload back to Huggingface (PR HuggingFace Upload Feature #1300)
Python interface, similar to PEFT
Add option to add LoRA layers to every linear layer
Demo!

TODOs (remaining ones):

Move QKV and output projections from attention into separate operators, to enable adding LoRA to them (PR modify constructor and allocate weights #1428).
Update implementation, debug and merge token-level finetuning (https://arxiv.org/abs/2402.18789, draft PR Peft token level fwd/bwd #1233)
Align models other than LLAMA (opt, falcon, etc...)
Add support for LLAMA-3, Phi, Mistral
group gemm kernels for multi-tenant lora
HIP support
Add softmax automatically (for categorical cross-entropy loss)

Related Issues:

Linked Issues:

Issue #

Issues closed by this PR:

Closes #

This change is

goliaro · 2023-10-19T20:17:23Z

inference/models/llama.cc

@@ -255,7 +257,8 @@ void LLAMA::create_llama_model(FFModel &ff,
      output = ff.sampling(softmax, generation_config.topp);
    } else {
      // output = ff.arg_top_k(dense, /*k=*/1, false);
-      output = ff.argmax(dense, /*beam_Search*/ false);
+      Tensor softmax = ff.softmax(dense, -1);


@jiazhihao why is this change required?

Because we used softmax cross entropy loss, so the last layer of the LLM must be softmax

…Flow into peft

This reverts commit 90b2c87.

jiazhihao added the inference Features and fixes related to the inference project. label Sep 27, 2023

jiazhihao requested review from goliaro, zwang86, mengdiz97 and xinhaoc September 27, 2023 14:49

jiazhihao marked this pull request as draft September 27, 2023 14:50

wmdi requested review from wmdi and removed request for mengdiz97 September 27, 2023 19:29

goliaro reviewed Oct 19, 2023

View reviewed changes

jiazhihao and others added 18 commits October 23, 2023 12:12

.

d8e92e9

.

0a512d2

Update the default cublas behavior when CUDA_VERSION is not specified

4ee710a

Merge branch 'fix_cublas_default' of https://github.com/flexflow/Flex…

2adca3a

…Flow into peft

fix bugs in IncMHA peft_bwd kernel

464424e

resolve merge conflict

82d6e58

uncomment softmaxbackward

45c1e01

add layernorm to align test

07636e8

add peft test scripts

28a5e84

fix import

dd94370

fix

3c01328

add code to convert peft models

fa56364

add script to download peft for c++, fix bug

a484100

fix

c83c376

add script to fine-tune models

aa9f004

implement loading lora configs/weights from file

4609e9e

remove peft_bwd assertion failure in embedding

17fa6f3

fix download script

cdc12e6

goliaro force-pushed the peft branch from 617ba79 to cdc12e6 Compare October 31, 2023 19:56

goliaro added 2 commits October 31, 2023 16:54

add peft dependencies in dockerfile

eb9e2b8

fix softmax backward

3dfa14d

goliaro added 28 commits August 16, 2024 22:43

fixes

440ad3d

more fixes

5cbe1a4

update

9ca3687

fix alignment up to input ln

d0e98ec

finished aligning all backward (tp>1)

6ebea46

align all peft

f98999c

Merge branch 'inference' into peft

5f73328

fix

b06ed1a

fix broken link

3fe93dc

formatting

1a2fce3

fix

cf4525f

update

90b2c87

Revert "update"

eae9b12

This reverts commit 90b2c87.

update

828f72e

fix hip build

a8294e8

fix gpu ci

ccb28b1

fix gpu ci

ec472c2

update default gpu ci version to 12.0

aa1aa7b

update ci to 12.0

9b2bd47

fix

f929cca

fix

39b8d49

update

a60618b

fix

b8be6f5

fix

0330272

update

5ab1da5

fix

08012b0

add cleanup

e5785e6

downgrade to cuda=11.8

c37f363

goliaro marked this pull request as ready for review September 4, 2024 17:27

goliaro merged commit a0f1ed7 into inference Sep 4, 2024
57 of 58 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combining Inference and PEFT Tokens in a Batch #1153

Combining Inference and PEFT Tokens in a Batch #1153

jiazhihao commented Sep 27, 2023 •

edited by goliaro

Loading

goliaro Oct 19, 2023

jiazhihao Oct 23, 2023

Combining Inference and PEFT Tokens in a Batch #1153

Combining Inference and PEFT Tokens in a Batch #1153

Conversation

jiazhihao commented Sep 27, 2023 • edited by goliaro Loading

goliaro Oct 19, 2023

Choose a reason for hiding this comment

jiazhihao Oct 23, 2023

Choose a reason for hiding this comment

jiazhihao commented Sep 27, 2023 •

edited by goliaro

Loading