Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrading to python 3.8 #4910

Merged

Conversation

asingh9530
Copy link
Contributor

What this PR does / why we need it:

This PR is to upgrade pre-packaged servers from python 3.7 to python 3.8.

Which issue(s) this PR fixes:

Issue can be found here

Special notes for your reviewer:

@adriangonz Let's work on this PR to upgrade all pre-packaged servers.

@seldondev
Copy link
Collaborator

Hi @Abhinavfreecodecamp. Thanks for your PR.

I'm waiting for a SeldonIO or todo member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the jenkins-x/lighthouse repository.

@adriangonz
Copy link
Contributor

adriangonz commented Jun 12, 2023

Hey @Abhinavfreecodecamp ,

Keep in mind that the explainer / detect servers will be upgraded on a separate PR (see #4912) - so I'd remove any changes there from this PR. If you're happy to make the changes to the pre-packaged servers here, I can review once they're done - there's no need to collaborate on the same PR as that tends to confuse things.

I'd also flag this as a WIP PR until it's ready.

@asingh9530
Copy link
Contributor Author

Hi @adriangonz,

Sure, let me remove any changes to done to explainer/detect servers in this PR and this PR changes will be limited to /server dir. Yes WIP for this PR is fine until its ready.

@ukclivecox ukclivecox added the v1 label Jun 17, 2023
@adriangonz
Copy link
Contributor

Hey @Abhinavfreecodecamp ,

Following up on this one, would you be able to continue working on this PR? If not, that's totally understandable - we can always pick this up internally to finalise it ahead of the next release of SC.

@asingh9530
Copy link
Contributor Author

asingh9530 commented Jun 29, 2023

@adriangonz , I have made all the necessary changes could you review it. Also I am still not sure how you update these to docker hub.

Copy link
Contributor

@adriangonz adriangonz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one @Abhinavfreecodecamp ! Thanks for taking the time for this contribution.

I've added a couple comments below. Other than those, I think we're good to start kicking off the CI pipelines (which will build the images, run integration tests, etc. ).

servers/sklearnserver/sklearnserver/requirements.txt Outdated Show resolved Hide resolved
servers/xgboostserver/xgboostserver/requirements.txt Outdated Show resolved Hide resolved
@adriangonz
Copy link
Contributor

/test integration

@@ -1,4 +1,4 @@
seldon_core
scikit-learn == 0.24.2
numpy >= 1.8.2
scikit-learn == 1.1.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to keep the previous versions?

Copy link
Contributor Author

@asingh9530 asingh9530 Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it will be possible. Should I revert the changes to requirements ? My reason was the updated version will support up untill 3.10 so in future we will not be required to update these.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep it to the same versions as in master to reduce the number of changes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e.

scikit-learn==0.24.2, numpy>=1.8.2 and same would apply to the xgboost reqs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adriangonz sure let me revert back changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adriangonz in master for xgboost server the requirements are follows should I keep it same ?

scikit-learn == 0.20.3
numpy >= 1.8.2
xgboost == 1.4.2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's keep the same as in master for both

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adriangonz Done.

@asingh9530
Copy link
Contributor Author

/test integration

@seldondev
Copy link
Collaborator

@Abhinavfreecodecamp: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/test integration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the jenkins-x/lighthouse repository.

Copy link
Contributor

@adriangonz adriangonz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making those changes! PR looks good - once tests are green, it should be good to go.

@adriangonz
Copy link
Contributor

/test integration

1 similar comment
@adriangonz
Copy link
Contributor

/test integration

@asingh9530
Copy link
Contributor Author

@adriangonz Wanted to know why this integration test failed ?

@asingh9530
Copy link
Contributor Author

asingh9530 commented Jul 1, 2023

@adriangonz any idea why integration has failed as for the mentioned failed commit, since it just revert back the changes ?

@asingh9530
Copy link
Contributor Author

@adriangonz can you please help here, so that we can merge this PR.

@adriangonz
Copy link
Contributor

@Abhinavfreecodecamp most likely is due to flakiness on the integration tests setup

@adriangonz
Copy link
Contributor

/test integration

@asingh9530
Copy link
Contributor Author

@adriangonz I think it failed again 😅 , is it possible to share logs for the test so I can help with that.

@adriangonz
Copy link
Contributor

/test integration

@adriangonz
Copy link
Contributor

@Abhinavfreecodecamp so far it's just failing before the setup stage - so nothing to do with the scope of this PR. Tests seem flakier than usual for some reason.

Once it goes beyond that I'll share the logs if failures are relevant to this PR 👍

@asingh9530
Copy link
Contributor Author

@adriangonz it failed again 😅. ?

@adriangonz
Copy link
Contributor

adriangonz commented Jul 3, 2023

Ok - now it's a more relevant error:

___________________________ TestPrepack.test_xgboost ___________________________
[gw0] linux -- Python 3.7.10 /opt/conda/bin/python

self = <test_prepackaged_servers.TestPrepack object at 0x7f42690ece10>
namespace = 'test-xgboost'

    def test_xgboost(self, namespace):
        spec = "../../servers/xgboostserver/samples/iris.yaml"
        retry_run(f"kubectl apply -f {spec}  -n {namespace}")
        wait_for_status("xgboost", namespace)
>       wait_for_rollout("xgboost", namespace)

test_prepackaged_servers.py:108: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
seldon_e2e_utils.py:161: in wait_for_rollout
    wait_for_deployment(deployment_name, namespace, attempts, sleep)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

deployment_name = 'xgboost-default-0-classifier', namespace = 'test-xgboost'
attempts = 50, sleep = 5

    def wait_for_deployment(deployment_name, namespace, attempts=50, sleep=5):
        logging.info(f"Waiting for deployment {deployment_name}")
        for _ in range(attempts):
            ret = run(
                f"kubectl rollout status -n {namespace} deploy/{deployment_name}",
                shell=True,
            )
            if ret.returncode == 0:
                logging.info(f"Successfully waited for deployment {deployment_name}")
                break
            logging.warning(f"Unsuccessful wait command but retrying for {deployment_name}")
            time.sleep(sleep)
        assert (
            ret.returncode == 0
>       ), f"Wait for rollout of {deployment_name} failed: non-zero return code"
E       AssertionError: Wait for rollout of xgboost-default-0-classifier failed: non-zero return code

seldon_e2e_utils.py:140: AssertionError

This error is coming from the integration tests that you can find in this folder:

# Test prepackaged server for xgboost
def test_xgboost(self, namespace):
spec = "../../servers/xgboostserver/samples/iris.yaml"
retry_run(f"kubectl apply -f {spec} -n {namespace}")
wait_for_status("xgboost", namespace)
wait_for_rollout("xgboost", namespace)
time.sleep(1)
logging.warning("Initial request")
r = initial_rest_request(
"xgboost", namespace, data=[[0.1, 0.2, 0.3, 0.4]], dtype="ndarray"
)
assert r.status_code == 200
r = rest_request_ambassador("xgboost", namespace, method="metadata")
assert r.status_code == 200
res = r.json()
logging.warning(res)
assert res["name"] == "xgboost-iris"
assert res["versions"] == ["xgboost-iris/v1"]
r = grpc_request_ambassador(
"xgboost", namespace, data=np.array([[0.1, 0.2, 0.3, 0.4]])
)
res = json.loads(json_format.MessageToJson(r))
logging.info(res)
logging.warning("Success for test_prepack_xgboost")
run(f"kubectl delete -f {spec} -n {namespace}", shell=True)

Setting up the integration tests to run locally is a bit fiddly - so I'd probably just try running the XGBoost artefact that's used by that test (i.e. gs://seldon-models/xgboost/iris) and see if you can replicate an error. Otherwise, happy to try it out locally.

@asingh9530
Copy link
Contributor Author

@adriangonz let me setup the tests locally.

@adriangonz
Copy link
Contributor

Sure @Abhinavfreecodecamp - to run the tests locally, you will need to run the make -C testing/scripts kind_create_cluster kind_setup rules. These assume that KinD is already present locally (you will need to install it otherwise).

Besides that, you will also need to build and load your new images onto the cluster (which you can do with make -C testing/scripts kind_build_prepackaged).

Once the above is done (i.e. SC is running on a Kind cluster, where your new images have been loaded) you will be able to run the failing test.

@asingh9530
Copy link
Contributor Author

asingh9530 commented Jul 4, 2023

Hi @adriangonz ,Please see result below I ran for xgboost I did need to comment rest of the test cases as my docker memory ran out. For xgboost result went fine take a look below. (skipped warning logs)
`

~/Documents/seldon-core/testing/scripts on  upgrading-pre-pacakged-server! ⌚ 22:01:08
$ pytest test_prepackaged_servers.py
================================================================================ test session starts ================================================================================
platform darwin -- Python 3.8.16, pytest-7.3.1, pluggy-1.0.0
rootdir: /Users/abhinav.singh/Documents/seldon-core/testing/scripts
configfile: setup.cfg
plugins: anyio-3.7.0
collected 1 item                                                                                                                                                                    

test_prepackaged_servers.py .                                                                                                                                                 [100%]
==================================================================== 1 passed, 100 warnings in 65.13s (0:01:05) =====================================================================

`

also below is output while running ../../servers/xgboostserver/samples/iris.yaml directly. In both scenerios it is running fine. Hence I am not able to reproduce this error, is it possible for you to try at your end ?

Screenshot 2023-07-04 at 10 10 04 PM

@asingh9530
Copy link
Contributor Author

Hi @adriangonz is it possible this is happening due to k8s configuration and not due to changes in this PR ?

@asingh9530
Copy link
Contributor Author

hi @adriangonz sorry to ask you again, but if it is possible could you test this PR on your local ?

@adriangonz
Copy link
Contributor

Hey @Abhinavfreecodecamp ,

Got the message the first time 🙂

We'll try it out once we get a chance.

@asingh9530
Copy link
Contributor Author

Hey @Abhinavfreecodecamp ,

Got the message the first time 🙂

We'll try it out once we get a chance.

Thanks @adriangonz

@adriangonz
Copy link
Contributor

Hey @Abhinavfreecodecamp ,

Just had a look and was able to trigger the following error when running the XGBoost test:

Traceback (most recent call last):
  File "/opt/conda/bin/seldon-core-microservice", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.8/site-packages/seldon_core/microservice.py", line 610, in main
    interface_file = importlib.import_module(args.interface_name)
  File "/opt/conda/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/microservice/XGBoostServer.py", line 8, in <module>
    import xgboost as xgb
  File "/opt/conda/lib/python3.8/site-packages/xgboost/__init__.py", line 9, in <module>
    from .core import DMatrix, DeviceQuantileDMatrix, Booster
  File "/opt/conda/lib/python3.8/site-packages/xgboost/core.py", line 23, in <module>
    from .compat import (STRING_TYPES, DataFrame, py_str, PANDAS_INSTALLED,
  File "/opt/conda/lib/python3.8/site-packages/xgboost/compat.py", line 46, in <module>
    from sklearn.base import BaseEstimator
  File "/opt/conda/lib/python3.8/site-packages/sklearn/__init__.py", line 64, in <module>
    from .base import clone
  File "/opt/conda/lib/python3.8/site-packages/sklearn/base.py", line 14, in <module>
    from .utils.fixes import signature
  File "/opt/conda/lib/python3.8/site-packages/sklearn/utils/__init__.py", line 14, in <module>
    from . import _joblib
  File "/opt/conda/lib/python3.8/site-packages/sklearn/utils/_joblib.py", line 22, in <module>
    from ..externals import joblib
  File "/opt/conda/lib/python3.8/site-packages/sklearn/externals/joblib/__init__.py", line 119, in <module>
    from .parallel import Parallel
  File "/opt/conda/lib/python3.8/site-packages/sklearn/externals/joblib/parallel.py", line 28, in <module>
    from ._parallel_backends import (FallbackToBackend, MultiprocessingBackend,
  File "/opt/conda/lib/python3.8/site-packages/sklearn/externals/joblib/_parallel_backends.py", line 22, in <module>
    from .executor import get_memmapping_executor
  File "/opt/conda/lib/python3.8/site-packages/sklearn/externals/joblib/executor.py", line 14, in <module>
    from .externals.loky.reusable_executor import get_reusable_executor
  File "/opt/conda/lib/python3.8/site-packages/sklearn/externals/joblib/externals/loky/__init__.py", line 12, in <module>
    from .backend.reduction import set_loky_pickler
  File "/opt/conda/lib/python3.8/site-packages/sklearn/externals/joblib/externals/loky/backend/reduction.py", line 125, in <module>
    from sklearn.externals.joblib.externals import cloudpickle  # noqa: F401
  File "/opt/conda/lib/python3.8/site-packages/sklearn/externals/joblib/externals/cloudpickle/__init__.py", line 3, in <module>
    from .cloudpickle import *
  File "/opt/conda/lib/python3.8/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py", line 167, in <module>
    _cell_set_template_code = _make_cell_set_template_code()
  File "/opt/conda/lib/python3.8/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py", line 148, in _make_cell_set_template_code
    return types.CodeType(
TypeError: an integer is required (got type bytes)

It seems this is mainly a Python compatibility issue, so it's probably enough to update the XGBoost artefact in the test. These are saved in GCS though, so we'll take it from here to upload a fixed artefact there.

@asingh9530
Copy link
Contributor Author

@adriangonz didn't get it, so the PR is fine but the test using xgboost artefact which is not compatible with py3.8 ? is my understanding correct ?

@adriangonz
Copy link
Contributor

adriangonz commented Jul 7, 2023

@Abhinavfreecodecamp sort of - it seems the XGBoost artefact used in the tests is not compatible with the changes you've made to the XGBoost server, so we'll need to update the artefact in GCS (and as part of this PR, we'll also need to add a change to point to the new artefact).

@asingh9530
Copy link
Contributor Author

@adriangonz got it, actually this is what I thought that's why I updated xgb version previously. Anyway I will list py3.8 compatible xgb below.

@adriangonz adriangonz force-pushed the upgrading-pre-pacakged-server branch from eb3b3fb to e9be1a5 Compare July 7, 2023 11:37
@adriangonz
Copy link
Contributor

Hey @Abhinavfreecodecamp ,

You're right - all that was required was bumping the sklearn version on the xgboost runtime. I've also rebased from master to ensure it uses the latest version.

I'll kick off again the integration tests 👍

@adriangonz
Copy link
Contributor

/test integration

@asingh9530
Copy link
Contributor Author

Thanks @adriangonz

@asingh9530
Copy link
Contributor Author

@adriangonz still failing 🧐 strange as I recall sklearn 1.0 was compatible with py3.8

@adriangonz
Copy link
Contributor

May still be a flaky one - let me have a look

@adriangonz
Copy link
Contributor

/test integration

@adriangonz
Copy link
Contributor

/test notebooks

@adriangonz
Copy link
Contributor

Good news @Abhinavfreecodecamp - it seems like all tests are happy now! Thanks a lot for the time you've put into this one!

@adriangonz adriangonz merged commit e847d4e into SeldonIO:master Jul 8, 2023
@asingh9530
Copy link
Contributor Author

@adriangonz thanks for helping me out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants