-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added intera and inter op parallelism parameters to the hugggingface … #1081
added intera and inter op parallelism parameters to the hugggingface … #1081
Conversation
@adriangonz could you plz have a look at the failed test and let me know what do u think might be the problem? I think it is some problem with having tensorflow and torch in the requirements but I couldn't figure it out. |
Hey @saeid93 , It seems there's something going with the packages. It's unclear whether it's been triggered by the changes in https://github.com/SeldonIO/MLServer/actions/runs/4647262874/jobs/8224160766?pr=1081#step:6:338 |
runtimes/huggingface/setup.py
Outdated
@@ -36,6 +36,7 @@ def _load_description() -> str: | |||
install_requires=[ | |||
"mlserver", | |||
"optimum[onnxruntime]>=1.4.0, <1.8.0", | |||
"tensorflow==2.12.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't pin it here, as that will cause incompatibilities with other runtimes which are not yet using TF 2.12
.
Do you know why do we need to specify it explicitly? I thought optimum
already depended in tensorflow
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is not part of the optimum[onnxruntime]
here is the link. It is part of some other installations of link of Optimum but not the optimum[onnxruntime]
that is in MLServer. I spent some time yesterday figuring out the problem and tried testing it locally. The problem is that for some yet unknown reason, having Tensorflow here or in requirements/dev.txt
will result in pip
and tox
to get stuck in backtracking for versioning resolution. I've tried a number of changes in the version of the dependencies and still no luck. I think we can do either of the following:
- Remove support of the Tensorflow for this feature and add a warning message for using requesting it for users trying this feature with Tensorflow (as most of the HF models are PyTroch anyway these days)
- Continue resolving the TF scenario if you think they both should be supported and partial support for one framework is not acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adriangonz I added the 1st option, please let me know if there is any problem. I'll look into TensorFlow support in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @saeid93 ,
Just as an update, we merged a PR today which should make your life easier on this one:
Essentially, what was going was:
- At the top-level MLServer package, we had to pin
protobuf
to>=3.20.2
to avoid a CVE with previous versions. - However, TF
<2.12
is very stubborn on not using that version ofprotobuf
. In fact, pip keeps backtracking until it gets to TF2.9.0
(which mysteriously didn't have thatprotobuf
constrain, so is happy). - This leads to MLServer indirectly forcing an older version of TF which was incompatible with Alibi-Detect and a few others.
- The fix has been to move that
protobuf
constrain to the official Docker images - letting the user control what goes into their local environment (which could include the faulty version ofprotobuf
).
Based on the above, what I would try is to rebase from master
and then remove the extra tensorflow
dep in the runtimes/huggingface/setup.py
. That should hopefully let huggingface > transformers
pick up the correct version of TF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @adriangonz,
Thank you for the details. Yes, that's exactly what I was facing!
The only remaining part is that I need to include Tensorflow installation for the tests since it is not included in the dependencies (It isn't part of the onnx runtime either as I mentioned earlier #1081 (comment)),
import tensorflow as tf |
I first add it
dev/requirements.txt
, and also tried runtime/huggingface.setup.py
but it based on your feedback I think both are not intended for this. Please let me know where I can add this so I won't face the following error in the tests?
====================================================================================== ERRORS ======================================================================================
__________________________________________________________________________ ERROR collecting test session ___________________________________________________________________________
runtimes/huggingface/tests/test_tasks/conftest.py:2: in <module>
from ..utils import file_path, file_bytescontent
runtimes/huggingface/tests/utils.py:6: in <module>
from mlserver_huggingface.codecs.image import _pil_base64encode, _pil_base64decode
runtimes/huggingface/mlserver_huggingface/__init__.py:1: in <module>
from .runtime import HuggingFaceRuntime
runtimes/huggingface/mlserver_huggingface/runtime.py:12: in <module>
from .common import load_pipeline_from_settings
runtimes/huggingface/mlserver_huggingface/common.py:10: in <module>
import tensorflow
E ModuleNotFoundError: No module named 'tensorflow'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @adriangonz,
Thank you for the description. I tested several times and highly doubt that it will be installed automatically or there is any problem with my environment. Are yo sure that tensorflow
is not installed from before on your environment? Because I checked with several fresh envs and also MLServer Huggingface tox, tensorflow was not installed automatically in both of them.
In the transformers setup.py
the _deps
list is only used for making as a reference for finding the correct version of each package see here. The resulting deps
dictionary after performing the regular expression maps each package name to its version constraint. Not everything in that list will be necessarily installed. What is installed in the transformers is the install_requires
list that is composed of deps
that are made above.
Another sign for this is that if you see the same optimum code that you sent me they have also pinned PyTorch there although it is present in the same _deps
list that you mentioned in the setup.py
repository. If the PyTorch
was automatically installed then there was no need to add it there in the Optimum
library.
If you think pinning tensorflow
could cause problems, maybe we can revert back to not supporting tensorflow
or any other options you think might be useful. I'm happy to investigate further if you have any suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see! You're totally right. Thanks for that thorough explanation @saeid93. I saw tensorflow
there and I've been wrongly assuming all this time that the transformers
package would also install it.
In that case, I would suggest adding tensorflow
there as a dep but without pinning it to any specific version? Or did you run into issues when trying that out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adriangonz Sure, glad to be of any help. I loosened the tf version, however, it will search until gets to the version that I had previously fixed. Please have a look now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yuup, it seems that protobuf
is still interacting with the version of TF even though it's not pinned by the main mlserver
package anymore. I think it's OK if it still defaults to TF 2.9 though - not much we can do there (pinning it could cause plenty of incompatibilities with the other runtime deps).
So as it is right now (i.e. without any pin), it's probably as good as it gets! Thanks a lot for your help troubleshooting this one (and for pointing me on the right direction RE the transformers > tensorflow
dep).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, no problem. I applied your comments, I think it should be good to go now.
f2bdf55
to
9c2f698
Compare
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: adriangonz <adriangonz@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a couple minor comments around the inter_op
and intera_op
flag names and docstring, but everything else looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look great! This should be good to go now. Thanks a lot for addressing those comments and for your insights on the point about the tensorflow
dependency.
#1081) Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: adriangonz <adriangonz@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Adrian Gonzalez-Martin <agm@seldon.io>
Fixes #961
As there is no universal solution for setting these two variables I think exposing them to the user is the best solution for now. For containerized environments, as I showed in the experiments the number of cores is the best heuristic but that's not the general case as MLServer could be also deployed in none containerized platforms.