Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting an error when trying to perform SFT on Tiny Llama #8

Closed
survivebycoding opened this issue Aug 2, 2024 · 10 comments
Closed

Comments

@survivebycoding
Copy link

We are getting this error when trying to execute SFT on TinyLlama:
[rank0]: if f.read(7) == "version": [rank0]: File "/usr/lib/python3.10/codecs.py", line 322, in decode [rank0]: (result, consumed) = self._buffer_decode(data, self.errors, final) [rank0]: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 70: invalid start byte

However, we found no issue executing llama and mistral. If you have any idea regarding this issue, please let us know

@songmzhang
Copy link
Owner

songmzhang commented Aug 2, 2024

It is due to the version of transformers library. We also found this issue in our early experiments. The version suggested by TinyLLaMA is 4.31, while we loaded it with transformers == 4.38 and also found this error.

We conjecture that it may be because some weights of LlamaForCausalLM were not initialized from the model checkpoint.

Here is our solution:

  1. Create another environment with transformers == 4.31
  2. Load the model checkpoint and re-save it into another directory (noted as tinyllama_new), i.e., model = AutoModelForCausalLM.from_pretrained(original_tinyllama_path) and model.save_pretrained(tinyllama_new, safe_serialization=False).
  3. Switch to the original environment for this project and load the model checkpoint from tinyllama_new.

@songmzhang
Copy link
Owner

Thanks for reminding us of this issue and we will add this to README.md.

@survivebycoding
Copy link
Author

original_tinyllama_path - is this the path in our system where we have downloaded tiny llama?

@songmzhang
Copy link
Owner

original_tinyllama_path - is this the path in our system where we have downloaded tiny llama?

Yes.

@survivebycoding
Copy link
Author

survivebycoding commented Aug 5, 2024

image
The new llama folder created does not have the same files.

Getting the error"
[OSError: Can't load tokenizer for '/tinyllama/tinyllama-1.1b-3T'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '/tinyllama/tinyllama-1.1b-3T' is the correct path to a directory containing all relevant files for a LlamaTokenizerFast tokenizer.]

@songmzhang
Copy link
Owner

image The new llama folder created does not have the same files.

Getting the error" [OSError: Can't load tokenizer for '/tinyllama/tinyllama-1.1b-3T'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure '/tinyllama/tinyllama-1.1b-3T' is the correct path to a directory containing all relevant files for a LlamaTokenizerFast tokenizer.]

You need to copy tokenizer files (i.e., special_tokens_map.json, tokenizer_config.json, tokenizer.json, tokenizer.model) to the new directory to load the tokenizer.

@survivebycoding
Copy link
Author

survivebycoding commented Aug 5, 2024

Full finetuning eval scripts
image
are under gpt2 folder of scripts and for lora its under tiny llama scripts? Or is lora only applicable to tiny llama?

This part is a bit confusing, maybe you can clarify a bit what ckpt_path and lora_adapter_path is.... also, what is usually given in eval_batch_size?

@survivebycoding
Copy link
Author

survivebycoding commented Aug 5, 2024

In vanilla_KD_tinyllama [TEACHER_PEFT_PATH="path_to_teacher_sft_lora_ckpt"]
is the path to the folder called epoch9 right?
image

@songmzhang
Copy link
Owner

Full finetuning eval scripts image are under gpt2 folder of scripts and for lora its under tiny llama scripts? Or is lora only applicable to tiny llama?

This part is a bit confusing, maybe you can clarify a bit what ckpt_path and lora_adapter_path is.... also, what is usually given in eval_batch_size?

No, run_eval_lora.sh is not only applicable to TinyLLaMA. The scripts and README.md are just for re-implementing the experiments in our paper. Actually, you can also evaluate other fully fine-tuned checkpoints besides GPT2 with run_eval.sh.

Here, CKPT_PATH means the path of the full fine-tuned checkpoint. For example, in your case, it is the path of "epoch9_step...".

Similarly, LORA_ADAPTER_PATH is the path of the LoRA adapter, whose name has the same format as the full fine-tuned ckpt like "epoch9_step...".

For EVAL_BATCH_SIZE, it depends on the available GPU memory and the amount of the model parameters. For example, we use 32 for GPT2-base and 8 for TinyLLaMA.

@songmzhang
Copy link
Owner

In vanilla_KD_tinyllama [TEACHER_PEFT_PATH="path_to_teacher_sft_lora_ckpt"] is the path to the folder called epoch9 right? image

Yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants