fix: utilities to post process checkpoint for LoRA #338

Ssukriti · 2024-09-10T23:17:34Z

Description of the change

utility function to post process checkpoint after LoRA tuning to convert to format required by vLLM.
This will need to be called at end of LoRA tuning to allow inferencing on LoRA, for models for which we have added new tokens.

Since it is fast enough to load adapters.safetensors , added it as a post-processing function.

This PR adds a script that can be called after tuning to do the processing.

Related issue number

https://github.ibm.com/ai-foundation/watson-fm-stack-tracker/issues/1210
Details on vLLM issue: vllm-project/vllm#2816 (comment)

Context: Embedding vectors for new tokens need to be placed in new_embeddings.safetensors and lm_head.weight should be deleted from adapter_model.safetensors as per vllm-project/vllm#2816 (comment)

How to verify the PR

Function called on llama model which was LoRA tuned - post-processed and being tested on vLLM
Verified PR with every unique model architecture we support

Was the PR tested

I have added >=1 unit test(s) for every new method I have added.
I have ensured all unit tests pass

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

tuning/utils/merge_model_utils.py

kmehant

Couple of comments, thanks

Signed-off-by: Angel Luu <angel.luu@us.ibm.com>

github-actions · 2024-09-17T22:02:22Z

Thanks for making a pull request! 😃
One of the maintainers will review and advise on the next steps.

tuning/utils/merge_model_utils.py

Signed-off-by: Angel Luu <angel.luu@us.ibm.com>

aluu317 · 2024-09-18T16:46:31Z

I added a commit to add unit test in test_merge_model_utils.py. Steps to run it (since Github Action skips it due to not having cuda);

Start a sleep pod with volume mount to our fmaas-integration-tests COS bucket.
Change the value in test file just for this run:

DUMMY_TUNED_LLAMA_WITH_ADDED_TOKENS = "/testing/tuning/output/llama_dummy_LoRA/checkpoint-35"

Copy test file into pod:

oc cp tests/utils/test_merge_model_utils.py angel-sleep-pod-sft-trainer:/home/tuning/.local/lib/python3.11/site-packages/tuning/utils/test_merge_model_utils.py

In sleep pod, pip install pytest. Then run:

[tuning@angel-sleep-pod-sft-trainer app]$ cd /home/tuning/.local/lib/python3.11/site-packages/tuning/utils
[tuning@angel-sleep-pod-sft-trainer utils]$ pytest test_merge_model_utils.py 
======================================= test session starts ========================================
platform linux -- Python 3.11.7, pytest-8.3.3, pluggy-1.5.0
rootdir: /home/tuning/.local/lib/python3.11/site-packages/tuning/utils
collected 1 item                                                                                   

test_merge_model_utils.py .                                                                  [100%]

======================================== 1 passed in 3.50s =========================================

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

…ss_LoRA Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

tuning/sft_trainer.py

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

* get num_added_tokens Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * remove extra code Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> --------- Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

Signed-off-by: Angel Luu <angel.luu@us.ibm.com>

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

willmj · 2024-09-20T16:49:39Z

Note these tests are failing because they are running old unit tets (see changes in previous commit)
Test fail (only 10 args):

What it should be running (11 args):

Ssukriti · 2024-09-23T22:53:22Z

tuning/sft_trainer.py

-                    "pad_token": "<pad>",
-                }
-            )
+            special_tokens_dict["bos_token"] = "<s>"


this was just done to ensure we add and resize embeddings in same function

Ssukriti · 2024-09-23T22:55:42Z

tuning/sft_trainer.py

-    return trainer
+    additional_metadata = {}
+    additional_metadata["added_tokens_info"] = added_tokens_dict
+    return trainer, additional_metadata


Returning additional metadata from train() containing information of newly added tokens. @kmehant @ashokponkumar let me know if this change is ok with you - it might be disruptive if using SDK and if you were relying on 1 return value. We can make this a major release with the change to clarify API changed.

Alternatively, I can write the file artifacts I want inside 'output_dir' and later can copy it to save_model_dir as well, if return value is an issue. Then I can avoid returning the dict. I just thought this return value could be extendable to any other metadata we need in future

If we remove custom addition of tokens from fms-hf-tuning, we might not need to return this added tokens metadata.

Ssukriti · 2024-09-23T23:01:09Z

tests/artifacts/tuned_llama_with_added_tokens/adapter_config.json

@@ -0,0 +1,29 @@
+{


These files are all just dummy LoRA artifacts needed for unit tests

anhuong

Great work all! Few questions and changes, thanks for adding docs! I know you talked about adding some unit tests for the tokenizer_data_utils which would be good as well

scripts/post_process_adapters_vLLM.py

README.md

scripts/post_process_adapters_vLLM.py

tests/utils/test_merge_model_utils.py

anhuong · 2024-09-24T18:43:29Z

tuning/data/tokenizer_data_utils.py

@@ -44,3 +52,4 @@ def tokenizer_and_embedding_resize(

        input_embeddings[-num_new_tokens:] = input_embeddings_avg
        output_embeddings[-num_new_tokens:] = output_embeddings_avg
+    return {"num_new_tokens": num_new_tokens, "new_embedding_size": embedding_size}


Just to verify, new_embedding_size could be:

the new size of the embeddings after new tokens are added

new size of embeddings after resized to multiple_of

or the same size if none of the above occurs

its size of embeddings at end of tuning , whichever case it is. Will be helpful to decipher weight names based on size

tuning/sft_trainer.py

scripts/post_process_adapters_vLLM.py

anhuong · 2024-09-24T18:56:16Z

tuning/sft_trainer.py

+                    "w",
+                    encoding="utf-8",
+                ) as f:
+                    json.dump(additional_train_info["added_tokens_info"], f)


Should we check that num_new_tokens > 0 otherwise don't need to write this file? This is for the case where the model already has all the tokens it needs so no new tokens or embedding resize is happening.

no , writing it in all cases to differentiate between "tuning was not done on latest code / no tokens were added". The file will always exist, it will contain value 0 if no tokens added

added a check in post-process script to exit early if num_added_tokens == 0 though https://github.com/foundation-model-stack/fms-hf-tuning/pull/338/files#diff-443a0a5c8690e1a9251b2805f8e5850c97381d40b47cd90c5716eeaca84706d9R61

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

Ssukriti · 2024-09-25T02:55:51Z

@anhuong @willmj I have added sufficient unit tests pertaining to this change. Lets wait on @ashokponkumar to check the PR as well.

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

ashokponkumar

Few clarifications..

tuning/utils/merge_model_utils.py

ashokponkumar · 2024-09-25T13:07:13Z

tuning/sft_trainer.py

+    special_tokens_dict = {}
    if not model_args.tokenizer_name_or_path:
        # TODO: understand if we need to hardcode these here or just use defaults in model
        if isinstance(tokenizer, (LlamaTokenizer, LlamaTokenizerFast)):
-            tokenizer.add_special_tokens(
-                {
-                    "bos_token": "<s>",
-                    "eos_token": "</s>",
-                    "unk_token": "<unk>",
-                    "pad_token": "<pad>",
-                }
-            )
+            special_tokens_dict["bos_token"] = "<s>"
+            special_tokens_dict["eos_token"] = "</s>"
+            special_tokens_dict["unk_token"] = "<unk>"
+            special_tokens_dict["pad_token"] = "<pad>"
        elif isinstance(tokenizer, (GPT2Tokenizer, GPTNeoXTokenizerFast)):
-            tokenizer.add_special_tokens(
-                {
-                    "pad_token": "<pad>",
-                }
-            )
+            special_tokens_dict["pad_token"] = "<pad>"


Sorry for my ignorance.

But just wondering, why not just remove this logic that is specific to llama tokernizer or GPT2 tokenizer out of fms-hf-tuning, we just assume the base model just has it? This will remove the need for all these additional tokens. For example, I am not sure why unk token is being added. My undertanding was BPE based tokenizers never end up creating unk tokens.

Or we can make these tokens adding a externally flag, that allows adding custom tokens to any tokernizer. This would make fms-hf-tuning more generic.

I feel there is often some confusion here, this piece of code was taken from openinstruct https://github.com/allenai/open-instruct/blob/main/open_instruct/finetune.py#L635 , if you see they have mentioned that though the tokens are already in model, they are not added as special_tokens. This can have some implications in quality as models treat special tokens different from regular tokens - do not split it etc. Hence this change is just making those tokens special tokens instead and not adding any new tokens.

Num_added_tokens will just eb the pad token

pad token is always needed to be added for any model architecture - that is a open source requirement by SFT Trainer, without which training will not proceed. Hence we always add pad token for all architectures here https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/tuning/sft_trainer.py#L278 (except granite which already has pad token)

Hence minimum 1 token is always added and hence post processing is needed

ashokponkumar · 2024-09-25T13:08:19Z

tuning/sft_trainer.py

-    return trainer
+    additional_metadata = {}
+    additional_metadata["added_tokens_info"] = added_tokens_dict
+    return trainer, additional_metadata


If we remove custom addition of tokens from fms-hf-tuning, we might not need to return this added tokens metadata.

tests/utils/test_merge_model_utils.py

tuning/utils/merge_model_utils.py

Signed-off-by: Anh Uong <anh.uong@ibm.com>

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

Ssukriti · 2024-09-25T20:53:37Z

@ashokponkumar as explained on slack, we cannot remove addition of pad token as it is a rwquirement and known in open source. Without which we will run into this error https://stackoverflow.com/questions/70544129/transformers-asking-to-pad-but-the-tokenizer-does-not-have-a-padding-token

Most models in open source do not have pad token - new llama3.1, llama3, llama, allam, mixtral, mistral . Hence we have to add minimal 1 token for all these architectures, which we do in generic manner 'if pad token is None, set it'

This PR is thus doing the post-processing needed to handle addition of any token for LoRA inferencing on vLLM. Without this PR, LoRA inference on vLLM does not work for any of above architectures.

Even if we remove other tokens, like unk etc - which can be done in following PR and issue , the change is still needed for pad token. Hence better to keep the change generic and return number of added tokens from code

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

anhuong

LGTM 🚀

changes addressed and reviewer not available

Ssukriti and others added 2 commits September 10, 2024 17:16

utilities to post process checkpoint for LoRA

fa42c73

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

Merge branch 'main' into utility_to_post-process_LoRA

e5e4c27

Ssukriti requested review from kmehant and fabianlim September 10, 2024 23:24

improve code comments

0fa3dac

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

fabianlim reviewed Sep 10, 2024

View reviewed changes

tuning/utils/merge_model_utils.py Outdated Show resolved Hide resolved

fabianlim reviewed Sep 10, 2024

View reviewed changes

tuning/utils/merge_model_utils.py Outdated Show resolved Hide resolved

kmehant reviewed Sep 11, 2024

View reviewed changes

tuning/utils/merge_model_utils.py Outdated Show resolved Hide resolved

kmehant reviewed Sep 11, 2024

View reviewed changes

tuning/utils/merge_model_utils.py Outdated Show resolved Hide resolved

kmehant reviewed Sep 11, 2024

View reviewed changes

tuning/utils/merge_model_utils.py Outdated Show resolved Hide resolved

kmehant previously requested changes Sep 11, 2024

View reviewed changes

Add unit test and fix some lint errors

fa97871

Signed-off-by: Angel Luu <angel.luu@us.ibm.com>

aluu317 reviewed Sep 17, 2024

View reviewed changes

tuning/utils/merge_model_utils.py Outdated Show resolved Hide resolved

lint: fix more fmt errors

4c9bb95

Signed-off-by: Angel Luu <angel.luu@us.ibm.com>

feat: Add post_process_vLLM_adapters_new_tokens function to main

af191d1

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

willmj changed the title ~~utilities to post process checkpoint for LoRA~~ fix: utilities to post process checkpoint for LoRA Sep 18, 2024

github-actions bot added the fix label Sep 18, 2024

willmj added 2 commits September 18, 2024 15:39

Merge remote-tracking branch 'origin/main' into utility_to_post-proce…

fb1dcc9

…ss_LoRA Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fmt

bcc17b1

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

Ssukriti commented Sep 18, 2024

View reviewed changes

tuning/sft_trainer.py Outdated Show resolved Hide resolved

willmj and others added 2 commits September 18, 2024 18:06

fix: Add post processing flag so post processing is only done for vLLM

57cadc3

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>

fix: get num_added_tokens from resize function (#344)

36a554c

* get num_added_tokens Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * remove extra code Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> --------- Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

Ssukriti force-pushed the utility_to_post-process_LoRA branch from c941511 to 36a554c Compare September 19, 2024 17:10

Ssukriti and others added 4 commits September 19, 2024 11:11

Merge branch 'main' into utility_to_post-process_LoRA

0d34b1f

Ran fmt and also removed unneccessary files from test artifact

4380c5b

Signed-off-by: Angel Luu <angel.luu@us.ibm.com>

fix: unit tests

146e9f1

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

fix: Adding tokens in special_tokens_dict

0022da3

Signed-off-by: Abhishek <maurya.abhishek@ibm.com>

Ssukriti commented Sep 23, 2024

View reviewed changes

willmj mentioned this pull request Sep 24, 2024

feat: Add post processing logic to accelerate launch #346

Closed

2 tasks

anhuong requested changes Sep 24, 2024

View reviewed changes

Ssukriti added 7 commits September 24, 2024 13:55

more warnings /exceptions in script

3966fef

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

check for no tokens added

2b73e63

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

fix:linting

e4dd9b2

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

additional unit test

9caef81

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

add more tests

820222c

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

fix:tokenizer test

5a8aca0

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

fix:linting and docstrings

8f92b90

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

fix:return type of trainer

48321e3

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

ashokponkumar reviewed Sep 25, 2024

View reviewed changes

anhuong reviewed Sep 25, 2024

View reviewed changes

tests/utils/test_merge_model_utils.py Outdated Show resolved Hide resolved

tuning/utils/merge_model_utils.py Show resolved Hide resolved

anhuong and others added 5 commits September 25, 2024 14:00

test: enable tests and fix copytree

85f623b

Signed-off-by: Anh Uong <anh.uong@ibm.com>

use copy function from build

7531836

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

fix:linting and formatting

3eb0e54

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

make build a module

f8fd164

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

Merge branch 'main' into utility_to_post-process_LoRA

3aaae3c

add back old copy function

2b92881

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

anhuong approved these changes Sep 25, 2024

View reviewed changes

Ssukriti enabled auto-merge (squash) September 25, 2024 21:11

Ssukriti disabled auto-merge September 25, 2024 21:11

Ssukriti requested a review from kmehant September 25, 2024 21:12

Ssukriti merged commit 7714dfc into main Sep 25, 2024
10 checks passed

willmj mentioned this pull request Sep 25, 2024

feat: Add post processing logic to accelerate launch #351

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: utilities to post process checkpoint for LoRA #338

fix: utilities to post process checkpoint for LoRA #338

Ssukriti commented Sep 10, 2024 •

edited

Loading

kmehant left a comment

github-actions bot commented Sep 17, 2024

aluu317 commented Sep 18, 2024

willmj commented Sep 20, 2024 •

edited

Loading

Ssukriti Sep 23, 2024

Ssukriti Sep 23, 2024 •

edited

Loading

ashokponkumar Sep 25, 2024

Ssukriti Sep 23, 2024

anhuong left a comment

anhuong Sep 24, 2024

Ssukriti Sep 24, 2024

anhuong Sep 24, 2024

Ssukriti Sep 24, 2024

Ssukriti Sep 24, 2024 •

edited

Loading

Ssukriti commented Sep 25, 2024

ashokponkumar left a comment

ashokponkumar Sep 25, 2024

Ssukriti Sep 25, 2024

Ssukriti Sep 25, 2024

ashokponkumar Sep 25, 2024

Ssukriti commented Sep 25, 2024

anhuong left a comment

fix: utilities to post process checkpoint for LoRA #338

fix: utilities to post process checkpoint for LoRA #338

Conversation

Ssukriti commented Sep 10, 2024 • edited Loading

Description of the change

Related issue number

How to verify the PR

Was the PR tested

kmehant left a comment

Choose a reason for hiding this comment

github-actions bot commented Sep 17, 2024

aluu317 commented Sep 18, 2024

willmj commented Sep 20, 2024 • edited Loading

Choose a reason for hiding this comment

Ssukriti Sep 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anhuong left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ssukriti Sep 24, 2024 • edited Loading

Choose a reason for hiding this comment

Ssukriti commented Sep 25, 2024

ashokponkumar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Ssukriti commented Sep 25, 2024

anhuong left a comment

Choose a reason for hiding this comment

Ssukriti commented Sep 10, 2024 •

edited

Loading

willmj commented Sep 20, 2024 •

edited

Loading

Ssukriti Sep 23, 2024 •

edited

Loading

Ssukriti Sep 24, 2024 •

edited

Loading