-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please future prove clean_up_tokenization_spaces
#2922
Comments
clean_up_tokenization_spaces
clean_up_tokenization_spaces
This Warning occurs when the kwargs in AutoTokenizer.from_pretrained does not include the clean_up_tokenization_spaces setting value. To prevent this Warning from being issued, clean_up_tokenization_spaces needs to be added to all AutoTokenizer.from_pretrained calls used within sentence_transformers. For example, we can confirm that the warning is being generated in examples/unsupervised_learning/TSDAE/train_stsb_tsdae.py. $ python examples/unsupervised_learning/TSDAE/train_stsb_tsdae.py
/Users/username/project/sentence-transformers/.venv/lib/python3.8/site-packages/transformers/tokenization_utils_base.py:1600: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
warnings.warn( To avoid the Warning in this process,
The Warning message states that the default value will be changed to False in future versions, but according to this pull request, huggingface/transformers#31938 Therefore, it seems that no action is needed for BERT-based models.
I'm not sure about the full scope of impact, so I can't say whether this response is correct, but I believe this approach would avoid the Warning. |
cc @tomaarsen if you need insight on that tell me! |
@ArthurZucker I'm considering following @pesuchin 's recommendation and adding I think hardcoding cc @itazap
|
Hey! This is the future PR to deprecate to In terms of future models, the |
Please how to solve this error ? |
@pradip292 what is the error you are experiencing? This warning is expected in order to communicate the future deprecation |
after this warning my streamlit is automatically going stop |
@pradip292 Can you paste the error? Perhaps your streamlit needs to suppress warnings to render but the warning being present shouldn't result in an error |
i did it but still same error i am facing |
I am getting the warning but my application i working perfectly fine. |
i will check and update u after some time |
clean_up_tokenization_spaces |
@SDArtz @pradip292 thanks for providing the output, this output is an exptected warning that we want to display, it is not an error |
now this is another error i am facing i have tried many options but it is still showing this error only what should i do -> raise SSLError(e, request=request) |
@pradip292 Are you able to browse to this URL: https://huggingface.co/sentence-transformers/all-mpnet-base-v2/resolve/main/config.json It seems that you were (temporarily or otherwise) unable to automatically download this file, which is required to initialize the model. It is unrelated to the
|
Thanks for the answer! This sounds like I should indeed defer to
|
Yes i am facing that issue in my laptop only, my friends who tried there laptops it worked but i am facing this issue i dont know how to deal with it i have downloaded all the ssl file and all but still, help me please |
@tomaarsen yes exactly - the deprecation will maintain |
This is probably caused by a firewall (man in the middle) that changes SSL certificates. We had similar errors with our company firewall. Check this out. |
I am also facing the same issue on my laptop and don't know how to solve. |
Issue is solved :-) |
What is the solution? can you please let me know |
Actually that model is not working so i am using other model of hunggingface there are different models are out there. and about ssl errors just need to download some files that code used, that was remain, i have taken the help of chatgpt and i able to solve that error. -> sorry for my english i am student |
Okay, I will try it out, thanks!! |
|
我是直接在BertTokenizer中加入了clean_up_tokenization_spaces=False |
Any news on a solution for the original issue? |
Not yet, for the time being I changed the vector database to FAISS and Groq model to llama3-8b-8192 |
This is the future warning we are currently reciving:
transformers\tokenization_utils_base.py:1601: FutureWarning:
clean_up_tokenization_spaces
was not set. It will be set toTrue
by default. This behavior will be depracted in transformers v4.45, and will be then set toFalse
by default. For more details check this issue: huggingface/transformers#31884The text was updated successfully, but these errors were encountered: