Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'Llama tokenizer' issues #29

Open
wizaaaard opened this issue Sep 19, 2024 · 0 comments
Open

'Llama tokenizer' issues #29

wizaaaard opened this issue Sep 19, 2024 · 0 comments

Comments

@wizaaaard
Copy link

Dear author, thank you for your contribution to this community. I found a warning message similar to the following when using the run_dpo.sh script in your code:

You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at huggingface/transformers#24565

I just used my own dataset, and the model is still consistent with the default "ehartford/Wizard-Vicuna-7B-Uncensered" in the configuration file. May I ask what this warning means and why it was generated? Will it affect the accuracy of my training? Thank you for your reply

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant