Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing deprecation warnings in new versions of pyterrier and transformers #10

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

CodingTil
Copy link

In the latest version of pyterrier it is recommended to create new transformers by extending pt.Transformer directly, instead of pt.TransfomerBase.[1] The old approach now throws a deprecation warning. This MR fixes this, be extending now from pt.Transformer.

In the latest version of transformers, an issue has been fixed regarding the T5 tokenizers.[2] This MR in the transformers package changed how you tokenize sentences (regarding model_max_length, max_length, and truncation for auto-truncation), i.e., you must be explicit now. Legacy behavior is still supported, and the default, if model_max_length is not being set, resulting in a deprecation warning. I've fixed this by adding model_max_length as a parameter to the monoT5 and duoT5 transformers and a parameter for the truncation setting - per default auto-truncation.[3]

I have not (yet) updated this package's version (currently 0.0.1). I am unsure if this is deliberate.

Furthermore, I have slightly updated the versions in the readme.txt, as well as added more build-tests to the GitHub Workflow.

@cmacdonald
Copy link
Contributor

Hi @CodingTil, we'll have a look at this. Does the change prevent older transformer version from working? We try to be fairly conservative in upgrading transformers.

Craig

@CodingTil
Copy link
Author

CodingTil commented Oct 5, 2023

Hey @cmacdonald. I've just checked this:
For this, I have created an entirely new conda environment with python3.11.
In this environment, I have installed transformers=4.3.0, as well as packaging=21 (the newest version is incompatible with the old transformers code).
Pytest succeeded here.

Unfortunately, I am unable to test whether transformers>=4.0.0,<4.3 is working. The reason for that is, that I have a M1 Mac, and during the installation process (specifically the package tokenizers>=0.10.1,0.11), a rust compilation process is required, and it somehow does not think that aarch64-apple-darwin is a valid target, although it is (nowadays) and also integrated in my rustup toolchain.

EDIT:
If you want to reproduce this, make sure to install transformers=4.3.0 (using --update --force-reinstall) after installing the package locally (pip install -e .), since the local package installation will install transformers>=4.43.0

@CodingTil
Copy link
Author

@cmacdonald After a lot of trail and error (see commits), I could finally replicate an environment of the time when transformers 4.0.0 was released (November 30th, 2020). I've created a new GitHub workflow job that checks whether pytest runs successfully in the old version - and it does.

So, I've changed the transformers minimum required version back to 4.0.0

@cmacdonald
Copy link
Contributor

Hi @CodingTil

I'm sorry that this dropped off our radar, we'll try to get this merged soon. Sorry about that!

Craig

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants