Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: utilities to post process checkpoint for LoRA #338

Merged
merged 39 commits into from
Sep 25, 2024

Conversation

Ssukriti
Copy link
Collaborator

@Ssukriti Ssukriti commented Sep 10, 2024

Description of the change

utility function to post process checkpoint after LoRA tuning to convert to format required by vLLM.
This will need to be called at end of LoRA tuning to allow inferencing on LoRA, for models for which we have added new tokens.

Since it is fast enough to load adapters.safetensors , added it as a post-processing function.

This PR adds a script that can be called after tuning to do the processing.

Related issue number

https://github.ibm.com/ai-foundation/watson-fm-stack-tracker/issues/1210
Details on vLLM issue: vllm-project/vllm#2816 (comment)

Context: Embedding vectors for new tokens need to be placed in new_embeddings.safetensors and lm_head.weight should be deleted from adapter_model.safetensors as per vllm-project/vllm#2816 (comment)

How to verify the PR

  1. Function called on llama model which was LoRA tuned - post-processed and being tested on vLLM
  2. Verified PR with every unique model architecture we support

Was the PR tested

  • I have added >=1 unit test(s) for every new method I have added.
  • I have ensured all unit tests pass

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Copy link
Collaborator

@kmehant kmehant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of comments, thanks

Signed-off-by: Angel Luu <angel.luu@us.ibm.com>
Copy link

Thanks for making a pull request! 😃
One of the maintainers will review and advise on the next steps.

Signed-off-by: Angel Luu <angel.luu@us.ibm.com>
@aluu317
Copy link
Collaborator

aluu317 commented Sep 18, 2024

I added a commit to add unit test in test_merge_model_utils.py. Steps to run it (since Github Action skips it due to not having cuda);

  • Start a sleep pod with volume mount to our fmaas-integration-tests COS bucket.
  • Change the value in test file just for this run:
DUMMY_TUNED_LLAMA_WITH_ADDED_TOKENS = "/testing/tuning/output/llama_dummy_LoRA/checkpoint-35"
  • Copy test file into pod:
oc cp tests/utils/test_merge_model_utils.py angel-sleep-pod-sft-trainer:/home/tuning/.local/lib/python3.11/site-packages/tuning/utils/test_merge_model_utils.py
  • In sleep pod, pip install pytest. Then run:
[tuning@angel-sleep-pod-sft-trainer app]$ cd /home/tuning/.local/lib/python3.11/site-packages/tuning/utils
[tuning@angel-sleep-pod-sft-trainer utils]$ pytest test_merge_model_utils.py 
======================================= test session starts ========================================
platform linux -- Python 3.11.7, pytest-8.3.3, pluggy-1.5.0
rootdir: /home/tuning/.local/lib/python3.11/site-packages/tuning/utils
collected 1 item                                                                                   

test_merge_model_utils.py .                                                                  [100%]

======================================== 1 passed in 3.50s =========================================

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
@willmj willmj changed the title utilities to post process checkpoint for LoRA fix: utilities to post process checkpoint for LoRA Sep 18, 2024
@github-actions github-actions bot added the fix label Sep 18, 2024
…ss_LoRA

Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
tuning/sft_trainer.py Outdated Show resolved Hide resolved
willmj and others added 2 commits September 18, 2024 18:06
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
* get num_added_tokens

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

* remove extra code

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>

---------

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Ssukriti and others added 4 commits September 19, 2024 11:11
Signed-off-by: Angel Luu <angel.luu@us.ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Signed-off-by: Abhishek <maurya.abhishek@ibm.com>
@willmj
Copy link
Collaborator

willmj commented Sep 20, 2024

Note these tests are failing because they are running old unit tets (see changes in previous commit)
Test fail (only 10 args):
image
What it should be running (11 args):
image

"pad_token": "<pad>",
}
)
special_tokens_dict["bos_token"] = "<s>"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was just done to ensure we add and resize embeddings in same function

return trainer
additional_metadata = {}
additional_metadata["added_tokens_info"] = added_tokens_dict
return trainer, additional_metadata
Copy link
Collaborator Author

@Ssukriti Ssukriti Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning additional metadata from train() containing information of newly added tokens. @kmehant @ashokponkumar let me know if this change is ok with you - it might be disruptive if using SDK and if you were relying on 1 return value. We can make this a major release with the change to clarify API changed.

Alternatively, I can write the file artifacts I want inside 'output_dir' and later can copy it to save_model_dir as well, if return value is an issue. Then I can avoid returning the dict. I just thought this return value could be extendable to any other metadata we need in future

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove custom addition of tokens from fms-hf-tuning, we might not need to return this added tokens metadata.

@@ -0,0 +1,29 @@
{
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These files are all just dummy LoRA artifacts needed for unit tests

Copy link
Collaborator

@anhuong anhuong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work all! Few questions and changes, thanks for adding docs! I know you talked about adding some unit tests for the tokenizer_data_utils which would be good as well

scripts/post_process_adapters_vLLM.py Show resolved Hide resolved
scripts/post_process_adapters_vLLM.py Outdated Show resolved Hide resolved
README.md Show resolved Hide resolved
scripts/post_process_adapters_vLLM.py Show resolved Hide resolved
scripts/post_process_adapters_vLLM.py Show resolved Hide resolved
tests/utils/test_merge_model_utils.py Outdated Show resolved Hide resolved
@@ -44,3 +52,4 @@ def tokenizer_and_embedding_resize(

input_embeddings[-num_new_tokens:] = input_embeddings_avg
output_embeddings[-num_new_tokens:] = output_embeddings_avg
return {"num_new_tokens": num_new_tokens, "new_embedding_size": embedding_size}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to verify, new_embedding_size could be:

  • the new size of the embeddings after new tokens are added
  • new size of embeddings after resized to multiple_of
  • or the same size if none of the above occurs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its size of embeddings at end of tuning , whichever case it is. Will be helpful to decipher weight names based on size

tuning/sft_trainer.py Show resolved Hide resolved
scripts/post_process_adapters_vLLM.py Show resolved Hide resolved
"w",
encoding="utf-8",
) as f:
json.dump(additional_train_info["added_tokens_info"], f)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check that num_new_tokens > 0 otherwise don't need to write this file? This is for the case where the model already has all the tokens it needs so no new tokens or embedding resize is happening.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no , writing it in all cases to differentiate between "tuning was not done on latest code / no tokens were added". The file will always exist, it will contain value 0 if no tokens added

Copy link
Collaborator Author

@Ssukriti Ssukriti Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
@Ssukriti
Copy link
Collaborator Author

@anhuong @willmj I have added sufficient unit tests pertaining to this change. Lets wait on @ashokponkumar to check the PR as well.

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Copy link
Collaborator

@ashokponkumar ashokponkumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few clarifications..

tuning/utils/merge_model_utils.py Show resolved Hide resolved
tuning/utils/merge_model_utils.py Outdated Show resolved Hide resolved
Comment on lines +249 to +258
special_tokens_dict = {}
if not model_args.tokenizer_name_or_path:
# TODO: understand if we need to hardcode these here or just use defaults in model
if isinstance(tokenizer, (LlamaTokenizer, LlamaTokenizerFast)):
tokenizer.add_special_tokens(
{
"bos_token": "<s>",
"eos_token": "</s>",
"unk_token": "<unk>",
"pad_token": "<pad>",
}
)
special_tokens_dict["bos_token"] = "<s>"
special_tokens_dict["eos_token"] = "</s>"
special_tokens_dict["unk_token"] = "<unk>"
special_tokens_dict["pad_token"] = "<pad>"
elif isinstance(tokenizer, (GPT2Tokenizer, GPTNeoXTokenizerFast)):
tokenizer.add_special_tokens(
{
"pad_token": "<pad>",
}
)
special_tokens_dict["pad_token"] = "<pad>"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my ignorance.

But just wondering, why not just remove this logic that is specific to llama tokernizer or GPT2 tokenizer out of fms-hf-tuning, we just assume the base model just has it? This will remove the need for all these additional tokens. For example, I am not sure why unk token is being added. My undertanding was BPE based tokenizers never end up creating unk tokens.

Or we can make these tokens adding a externally flag, that allows adding custom tokens to any tokernizer. This would make fms-hf-tuning more generic.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel there is often some confusion here, this piece of code was taken from openinstruct https://github.com/allenai/open-instruct/blob/main/open_instruct/finetune.py#L635 , if you see they have mentioned that though the tokens are already in model, they are not added as special_tokens. This can have some implications in quality as models treat special tokens different from regular tokens - do not split it etc. Hence this change is just making those tokens special tokens instead and not adding any new tokens.

Num_added_tokens will just eb the pad token

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pad token is always needed to be added for any model architecture - that is a open source requirement by SFT Trainer, without which training will not proceed. Hence we always add pad token for all architectures here https://github.com/foundation-model-stack/fms-hf-tuning/blob/main/tuning/sft_trainer.py#L278 (except granite which already has pad token)

Hence minimum 1 token is always added and hence post processing is needed

return trainer
additional_metadata = {}
additional_metadata["added_tokens_info"] = added_tokens_dict
return trainer, additional_metadata
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we remove custom addition of tokens from fms-hf-tuning, we might not need to return this added tokens metadata.

tests/utils/test_merge_model_utils.py Outdated Show resolved Hide resolved
tuning/utils/merge_model_utils.py Show resolved Hide resolved
anhuong and others added 5 commits September 25, 2024 14:00
Signed-off-by: Anh Uong <anh.uong@ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
@Ssukriti
Copy link
Collaborator Author

@ashokponkumar as explained on slack, we cannot remove addition of pad token as it is a rwquirement and known in open source. Without which we will run into this error https://stackoverflow.com/questions/70544129/transformers-asking-to-pad-but-the-tokenizer-does-not-have-a-padding-token

Most models in open source do not have pad token - new llama3.1, llama3, llama, allam, mixtral, mistral . Hence we have to add minimal 1 token for all these architectures, which we do in generic manner 'if pad token is None, set it'

This PR is thus doing the post-processing needed to handle addition of any token for LoRA inferencing on vLLM. Without this PR, LoRA inference on vLLM does not work for any of above architectures.

Even if we remove other tokens, like unk etc - which can be done in following PR and issue , the change is still needed for pad token. Hence better to keep the change generic and return number of added tokens from code

Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com>
Copy link
Collaborator

@anhuong anhuong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@Ssukriti Ssukriti enabled auto-merge (squash) September 25, 2024 21:11
@Ssukriti Ssukriti dismissed kmehant’s stale review September 25, 2024 21:13

changes addressed and reviewer not available

@Ssukriti Ssukriti merged commit 7714dfc into main Sep 25, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants