feat: Support distributed batch inferencing job on Apache Spark cluster #890

parano · 2020-07-13T05:53:20Z

No description provided.

Talador12 · 2020-08-20T16:58:10Z

Is this related to #666 and #957 ?

parano · 2020-08-20T17:03:21Z

@Talador12 sorry I haven't got a chance to fill in the issue description yet. This one is very different from #666 and #957. This ticket is about applying ML model packaged with BentoML to large data set on a Spark cluster. It should work for models trained with any of the ML frameworks that BentoML supports(e.g. Tensorflow, Scikit-learn etc). While #666 is about support serving Spark MLlib model in BentoML.

Note that users can already do this with BentoML & spark today. Although we want to provide a set of tools on top of the existing BentoML input adapters API to make working with Spark's data types more easily.

xuzikun2003 · 2020-12-20T01:02:20Z

I would like to pick up this work and here is the design doc
https://docs.google.com/document/d/1C7_BT1kIF8Z2YJXioPUSg5J0yfDEfN5g5zYtcZW2Nx8/edit?usp=sharing

parano · 2021-01-19T23:30:08Z

Have been discussing this with @xuzikun2003 @bojiang and here's an update:

We are investigating making BentoService class/instance pickle serializable by hooking the pickle interface to BentoML's own save and load implementation. This should allow users to create Spark UDFs with BentoML packaged ML models more easily.

Note that this will be a separate effort from the design doc shared above, which shows BentoML's own batch inference API. BentoML's batch inference jobs API is a high-level API for launching and managing batch inference jobs. Whereas the Spark UDF integration gives the user more flexibility when working with Spark application.

Talador12 · 2021-01-20T16:57:10Z

This reads as an appropriate measure - Spark UDFs were made for this kind of custom code/integration. Thank you for taking the initiative on this! I will continue to follow and provide user feedback when I can :)

stale · 2021-06-02T17:32:41Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

alexdivet · 2022-03-08T11:49:46Z

is this on the roadmap for BentoML 1.0?

yubozhao · 2022-08-04T03:16:49Z

@Talador12 @alexdivet We are focusing on streaming and batching now after building a solid foundation with 1.0. Would love to hear any feedback.

parano added new feature help-wanted An issue currently lacks a contributor labels Jul 13, 2020

parano removed the MLH label Sep 13, 2020

yubozhao added the MLH label Sep 25, 2020

parano mentioned this issue Nov 6, 2020

Apache Kafka integration #1229

Closed

parano removed MLH help-wanted An issue currently lacks a contributor labels Jan 19, 2021

parano self-assigned this Jan 19, 2021

stale bot added the stale label Jun 2, 2021

stale bot closed this as completed Jun 18, 2021

parano reopened this Jun 19, 2021

stale bot removed the stale label Jun 19, 2021

ssheng changed the title ~~Support distributed batch inferencing job on Apache Spark cluster~~ feat: Support distributed batch inferencing job on Apache Spark cluster Aug 9, 2022

EricBentoML assigned sauyon Aug 12, 2022

qu8n mentioned this issue Oct 14, 2022

feat: support batch inference #3099

Closed

5 tasks

ssheng closed this as completed Jan 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support distributed batch inferencing job on Apache Spark cluster #890

feat: Support distributed batch inferencing job on Apache Spark cluster #890

parano commented Jul 13, 2020 •

edited

Loading

Talador12 commented Aug 20, 2020

parano commented Aug 20, 2020 •

edited

Loading

xuzikun2003 commented Dec 20, 2020

parano commented Jan 19, 2021 •

edited

Loading

Talador12 commented Jan 20, 2021

stale bot commented Jun 2, 2021

alexdivet commented Mar 8, 2022

yubozhao commented Aug 4, 2022

feat: Support distributed batch inferencing job on Apache Spark cluster #890

feat: Support distributed batch inferencing job on Apache Spark cluster #890

Comments

parano commented Jul 13, 2020 • edited Loading

Talador12 commented Aug 20, 2020

parano commented Aug 20, 2020 • edited Loading

xuzikun2003 commented Dec 20, 2020

parano commented Jan 19, 2021 • edited Loading

Talador12 commented Jan 20, 2021

stale bot commented Jun 2, 2021

alexdivet commented Mar 8, 2022

yubozhao commented Aug 4, 2022

parano commented Jul 13, 2020 •

edited

Loading

parano commented Aug 20, 2020 •

edited

Loading

parano commented Jan 19, 2021 •

edited

Loading