-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Start Offline-to-online ingestion" method in Python SDK #1051
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there is a way to structure this without the diamond inheritance? After re-reading this a couple times, subjectively i have a pretty hard time following what implements what methods, and what gets called when.
One way I can think of is to maybe split SparkJob
interface into two:
SparkJobParams
that deals with preparing the job to launch (so it provides file path / class name / args)SparkJob
that deals purely with the job lifecycle after it has been launched (so, id/name/status)
9f8e086
to
ebf8c6a
Compare
ebf8c6a
to
42c4480
Compare
@oavdeev that makes sense |
Signed-off-by: Oleksii Moskalenko <moskalenko.alexey@gmail.com>
aadd53e
to
664ab78
Compare
Signed-off-by: Oleksii Moskalenko <moskalenko.alexey@gmail.com>
Signed-off-by: Oleksii Moskalenko <moskalenko.alexey@gmail.com>
Signed-off-by: Oleksii Moskalenko <moskalenko.alexey@gmail.com>
Signed-off-by: Oleksii Moskalenko <moskalenko.alexey@gmail.com>
self._entity_source = entity_source | ||
self._destination = destination | ||
|
||
def get_name(self) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is a chance that we'll run into limits on name length on different platforms, on EMR it is 256 chars max
start: datetime, | ||
end: datetime, | ||
jar: str, | ||
**kwargs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't look like we use kwargs
sdk/python/feast/pyspark/launcher.py
Outdated
@@ -123,22 +129,45 @@ def start_historical_feature_retrieval_job( | |||
job_id: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it would make sense to remove job_id
here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the job_id
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: khorshuheng, oavdeev, pyalex The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Co-authored-by: Oleg Avdeev <oleg.v.avdeev@gmail.com> Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>
77fca22
to
e23ed4f
Compare
Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>
Signed-off-by: Khor Shu Heng <khor.heng@gojek.com>
/lgtm |
What this PR does / why we need it:
This PR aim to glue together various backend implementations of job submission and Feast SDK by introducing some cleaner structure with
SparkJob
,SparkJobParameters
interfaces and separatelaunch
modules.Also
offline_to_online_ingestion
is added toJobLauncher
interface. Implementations for standalone & dataproc launchers provided as well.Which issue(s) this PR fixes:
Fixes #
Does this PR introduce a user-facing change?: