Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support distributed batch inferencing job on Apache Spark cluster #890

Closed
parano opened this issue Jul 13, 2020 · 8 comments
Closed
Assignees

Comments

@parano
Copy link
Member

parano commented Jul 13, 2020

No description provided.

@parano parano added new feature help-wanted An issue currently lacks a contributor labels Jul 13, 2020
@Talador12
Copy link

Is this related to #666 and #957 ?

@parano
Copy link
Member Author

parano commented Aug 20, 2020

@Talador12 sorry I haven't got a chance to fill in the issue description yet. This one is very different from #666 and #957. This ticket is about applying ML model packaged with BentoML to large data set on a Spark cluster. It should work for models trained with any of the ML frameworks that BentoML supports(e.g. Tensorflow, Scikit-learn etc). While #666 is about support serving Spark MLlib model in BentoML.

Note that users can already do this with BentoML & spark today. Although we want to provide a set of tools on top of the existing BentoML input adapters API to make working with Spark's data types more easily.

@xuzikun2003
Copy link
Collaborator

I would like to pick up this work and here is the design doc
https://docs.google.com/document/d/1C7_BT1kIF8Z2YJXioPUSg5J0yfDEfN5g5zYtcZW2Nx8/edit?usp=sharing

@parano
Copy link
Member Author

parano commented Jan 19, 2021

Have been discussing this with @xuzikun2003 @bojiang and here's an update:

We are investigating making BentoService class/instance pickle serializable by hooking the pickle interface to BentoML's own save and load implementation. This should allow users to create Spark UDFs with BentoML packaged ML models more easily.

Note that this will be a separate effort from the design doc shared above, which shows BentoML's own batch inference API. BentoML's batch inference jobs API is a high-level API for launching and managing batch inference jobs. Whereas the Spark UDF integration gives the user more flexibility when working with Spark application.

@parano parano removed MLH help-wanted An issue currently lacks a contributor labels Jan 19, 2021
@parano parano self-assigned this Jan 19, 2021
@Talador12
Copy link

This reads as an appropriate measure - Spark UDFs were made for this kind of custom code/integration. Thank you for taking the initiative on this! I will continue to follow and provide user feedback when I can :)

@stale
Copy link

stale bot commented Jun 2, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jun 2, 2021
@stale stale bot closed this as completed Jun 18, 2021
@parano parano reopened this Jun 19, 2021
@stale stale bot removed the stale label Jun 19, 2021
@alexdivet
Copy link

is this on the roadmap for BentoML 1.0?

@yubozhao
Copy link
Contributor

yubozhao commented Aug 4, 2022

@Talador12 @alexdivet We are focusing on streaming and batching now after building a solid foundation with 1.0. Would love to hear any feedback.

@ssheng ssheng changed the title Support distributed batch inferencing job on Apache Spark cluster feat: Support distributed batch inferencing job on Apache Spark cluster Aug 9, 2022
@ssheng ssheng closed this as completed Jan 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants