"Start Offline-to-online ingestion" method in Python SDK #1051

pyalex · 2020-10-14T07:46:38Z

What this PR does / why we need it:

This PR aim to glue together various backend implementations of job submission and Feast SDK by introducing some cleaner structure with SparkJob, SparkJobParameters interfaces and separate launch modules.
Also offline_to_online_ingestion is added to JobLauncher interface. Implementations for standalone & dataproc launchers provided as well.

Which issue(s) this PR fixes:

Fixes #

Does this PR introduce a user-facing change?:

oavdeev

I wonder if there is a way to structure this without the diamond inheritance? After re-reading this a couple times, subjectively i have a pretty hard time following what implements what methods, and what gets called when.

One way I can think of is to maybe split SparkJob interface into two:

SparkJobParams that deals with preparing the job to launch (so it provides file path / class name / args)
SparkJob that deals purely with the job lifecycle after it has been launched (so, id/name/status)

move common job properties to job classes Signed-off-by: Oleksii Moskalenko <moskalenko.alexey@gmail.com> universal test

pyalex · 2020-10-15T09:33:27Z

@oavdeev that makes sense

Signed-off-by: Oleksii Moskalenko <moskalenko.alexey@gmail.com>

sdk/python/feast/pyspark/abc.py

oavdeev · 2020-10-15T11:04:52Z

sdk/python/feast/pyspark/abc.py

+        self._entity_source = entity_source
+        self._destination = destination
+
+    def get_name(self) -> str:


there is a chance that we'll run into limits on name length on different platforms, on EMR it is 256 chars max

oavdeev · 2020-10-15T11:05:28Z

sdk/python/feast/pyspark/abc.py

+        start: datetime,
+        end: datetime,
+        jar: str,
+        **kwargs,


doesn't look like we use kwargs

oavdeev · 2020-10-15T11:19:54Z

sdk/python/feast/pyspark/launcher.py

@@ -123,22 +129,45 @@ def start_historical_feature_retrieval_job(
    job_id: str,


i think it would make sense to remove job_id here as well

Removed the job_id

feast-ci-bot · 2020-10-15T11:21:07Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: khorshuheng, oavdeev, pyalex

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [khorshuheng,pyalex]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Co-authored-by: Oleg Avdeev <oleg.v.avdeev@gmail.com> Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>

Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>

khorshuheng · 2020-10-15T11:52:55Z

/lgtm

pyalex requested review from davidheryanto, khorshuheng, woop and zhilingc as code owners October 14, 2020 07:46

feast-ci-bot added do-not-merge/work-in-progress approved needs-kind size/XL size/XXL and removed size/XL labels Oct 14, 2020

oavdeev reviewed Oct 15, 2020

View reviewed changes

pyalex force-pushed the sdk-offline-to-online branch from 9f8e086 to ebf8c6a Compare October 15, 2020 07:08

launcher

42c4480

move common job properties to job classes Signed-off-by: Oleksii Moskalenko <moskalenko.alexey@gmail.com> universal test

pyalex force-pushed the sdk-offline-to-online branch from ebf8c6a to 42c4480 Compare October 15, 2020 09:29

feast-ci-bot added size/XL and removed size/XXL labels Oct 15, 2020

split SparkJob into 2 classes

664ab78

Signed-off-by: Oleksii Moskalenko <moskalenko.alexey@gmail.com>

pyalex force-pushed the sdk-offline-to-online branch from aadd53e to 664ab78 Compare October 15, 2020 10:04

pyalex added 3 commits October 15, 2020 18:07

better name

238d079

Signed-off-by: Oleksii Moskalenko <moskalenko.alexey@gmail.com>

weak link to client

c250e70

Signed-off-by: Oleksii Moskalenko <moskalenko.alexey@gmail.com>

weak link to client

5f630e9

Signed-off-by: Oleksii Moskalenko <moskalenko.alexey@gmail.com>

pyalex changed the title ~~WIP Offline-to-online method in SDK~~ "Start Offline-to-online ingestion" method in Python SDK Oct 15, 2020

feast-ci-bot removed the do-not-merge/work-in-progress label Oct 15, 2020

pyalex added the kind/feature New feature or request label Oct 15, 2020

feast-ci-bot removed the needs-kind label Oct 15, 2020

skip e2e test

5274b5d

Signed-off-by: Oleksii Moskalenko <moskalenko.alexey@gmail.com>

khorshuheng approved these changes Oct 15, 2020

View reviewed changes

oavdeev reviewed Oct 15, 2020

View reviewed changes

oavdeev approved these changes Oct 15, 2020

View reviewed changes

Update job naming

e23ed4f

Co-authored-by: Oleg Avdeev <oleg.v.avdeev@gmail.com> Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>

khorshuheng force-pushed the sdk-offline-to-online branch from 77fca22 to e23ed4f Compare October 15, 2020 11:33

khorshuheng added 2 commits October 15, 2020 19:44

Remove unused job_id argument

c64b07e

Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>

Remove unused import

9058e01

Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>

feast-ci-bot assigned khorshuheng Oct 15, 2020

feast-ci-bot added the lgtm label Oct 15, 2020

feast-ci-bot merged commit 065b310 into feast-dev:master Oct 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Start Offline-to-online ingestion" method in Python SDK #1051

"Start Offline-to-online ingestion" method in Python SDK #1051

pyalex commented Oct 14, 2020 •

edited

Loading

oavdeev left a comment

pyalex commented Oct 15, 2020

oavdeev Oct 15, 2020

oavdeev Oct 15, 2020

oavdeev Oct 15, 2020

khorshuheng Oct 15, 2020

feast-ci-bot commented Oct 15, 2020

khorshuheng commented Oct 15, 2020

		@@ -123,22 +129,45 @@ def start_historical_feature_retrieval_job(
		job_id: str,

"Start Offline-to-online ingestion" method in Python SDK #1051

"Start Offline-to-online ingestion" method in Python SDK #1051

Conversation

pyalex commented Oct 14, 2020 • edited Loading

oavdeev left a comment

Choose a reason for hiding this comment

pyalex commented Oct 15, 2020

oavdeev Oct 15, 2020

Choose a reason for hiding this comment

oavdeev Oct 15, 2020

Choose a reason for hiding this comment

oavdeev Oct 15, 2020

Choose a reason for hiding this comment

khorshuheng Oct 15, 2020

Choose a reason for hiding this comment

feast-ci-bot commented Oct 15, 2020

khorshuheng commented Oct 15, 2020

pyalex commented Oct 14, 2020 •

edited

Loading