'Llama tokenizer' issues #29

wizaaaard · 2024-09-19T06:12:34Z

Dear author, thank you for your contribution to this community. I found a warning message similar to the following when using the run_dpo.sh script in your code:

You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at huggingface/transformers#24565

I just used my own dataset, and the model is still consistent with the default "ehartford/Wizard-Vicuna-7B-Uncensered" in the configuration file. May I ask what this warning means and why it was generated? Will it affect the accuracy of my training? Thank you for your reply

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'Llama tokenizer' issues #29

'Llama tokenizer' issues #29

wizaaaard commented Sep 19, 2024

'Llama tokenizer' issues #29

'Llama tokenizer' issues #29

Comments

wizaaaard commented Sep 19, 2024

You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at huggingface/transformers#24565