-
Notifications
You must be signed in to change notification settings - Fork 805
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PySparkModelArtifact to support Spark MLLib #957
Add PySparkModelArtifact to support Spark MLLib #957
Conversation
These tests are just demos to ensure that Spark has been installed correctly and PySpark can be used. They are not the proper tests for a PySparkSavedModelArtifact, and should be replaced when further code has been written.
* Testing is far from complete (verifying prediction from PySpark model) * PySparkModelArtifact has many TODOs that need to be addressed.
Were only used to sanity-check that PySpark itself was setup correctly.
Was presented in API proposal, but I'm unsure of its use. Will discuss with BentoML maintainers when I create a PR.
Used to return Pandas DF. Harder to assert values this way, so a NumPy array is returned instead. Also, adapt existing tests to work this way as well.
* model_class -> ModelClass * Remove comment
Disclaimer -- tests in PR only work locally when Spark is installedPySpark does not work out of the box when
Right now I am looking more closely at the Clipper example to understand how they handled this issue, but I would love to hear maintainer opinions about how dependencies may have been handled in other cases (e.g. in relation to Docker image size concerns.) |
+1 keep at it @joshuacwnewton Let me know if there is any way we can help this initiative. |
@joshuacwnewton one way to include java(openjdk) is using Conda. We can add class PysparkModelArtifact(BentoServiceArtifact):
def set_dependencies(self, env: BentoServiceEnv):
env.add_conda_dependencies(['openjdk'])
env.add_pip_dependencies_if_missing(['pyspark']) I did a quick test and it works well. |
My apologies, but my personal situation has changed, and I'm no longer able to work on this feature. 🙁 @yubozhao Good point! That just leaves Spark (and possibly Scala). @Talador12 What needs help most is figuring out how to provide the dependencies PySpark needs. The questions I'm left with are:
Sorry I can't be of more help! Best of luck on this. |
@joshuacwnewton I am sorry to hear that your situation has changed. I can't thank you enough for everything you did for this community. It was great to work with you. I am looking forward to your return in the future. |
Codecov Report
@@ Coverage Diff @@
## master #957 +/- ##
==========================================
- Coverage 62.86% 62.61% -0.26%
==========================================
Files 123 126 +3
Lines 8112 8175 +63
==========================================
+ Hits 5100 5119 +19
- Misses 3012 3056 +44
Continue to review full report at Codecov.
|
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Description
Adds:
PySparkModelArtifact
inbentoml/artifact/pyspark_model_artifact.py
bentoml/artifact/__init__.py
PySparkClassifier
) intests/bento_service_examples
.travis.yml
to properly deal with Spark dependenciesRequesting code review to discuss design choices. This is my first time using Spark/PySpark, so any comments about idiomatic Spark code is much appreciated! (Also,
PySparkModelArtifact
contains TODOs referencing some of the design details in #666 (comment).)Thanks much! 😄
Motivation and Context
Fixes #666.
How Has This Been Tested?
Tests included in this PR were run in the following environment:
Spark installation: console output hidden, click to show
Environment packages: console output hidden, click to show
Types of changes
Component(s) if applicable
Checklist:
./dev/format.sh
and./dev/lint.sh
script have passed(instructions).