[FeatureRequest]KFP execution cache #2904

rui5i · 2020-01-23T19:38:29Z

Mutate webhook

webhook service
execution_key generator
pod modification controller
max_cache_staleness support

Mysql

persist execution_key

Releasing

standalone deployment
mkp deployment

rui5i · 2020-01-23T19:38:40Z

/assign @rui5i

rmgogogo · 2020-02-06T03:03:10Z

not sure which is the first phase target, from SDK-to-API, right?

rui5i · 2020-02-06T03:09:25Z

Sorry for the confusing. Update the issue content. We change the project to "KFP execution cache" and retry from a certain step will become a underlying CUJ. There will not be any new APIs. The design doc will be sent out really soon.

elikatsis · 2020-02-11T11:24:59Z

Hello @rui5i,

That's a really nice feature!
Any update on the design doc? I think it should be an item in the upcoming KFP community meeting.

Thanks!

rui5i · 2020-02-11T19:13:30Z

Hi @elikatsis ,

Thanks for your feedback! The design doc currently is under KFP team review. I am happy to present on the upcoming KFP community meeting!

elikatsis · 2020-02-18T13:36:52Z

Ping!
I see there is no entry in the agenda. Any update on that?

Padarn · 2020-03-27T02:04:38Z

Also interested in any docs on this, or updates on the release. I saw some meeting notes here: https://docs.google.com/document/d/1KB5KD8TvcrnxQX0xluHRnUdRYkM2-5-vB2V1wlxS4GY/edit#heading=h.gt0qfhljl8xo but its not clear to me if a decision has been made

rui5i · 2020-03-27T02:41:23Z

Also interested in any docs on this, or updates on the release. I saw some meeting notes here: https://docs.google.com/document/d/1KB5KD8TvcrnxQX0xluHRnUdRYkM2-5-vB2V1wlxS4GY/edit#heading=h.gt0qfhljl8xo but its not clear to me if a decision has been made

Hi, thanks for asking! The link you provided is our caching design doc. We are trying to make it available on 0.3.1. I'll let you know after it's release.

Padarn · 2020-03-27T02:46:50Z

Awesome. Feel free to ask if there is anything can help from community.

elikatsis · 2020-03-30T12:24:41Z

Hi again,
nice job with the implementation so far! Keep it up!

I wanted to ask, are there any plans on what will you be showing as step's logs in KFP UI? Or the cached steps will appear with empty logs?
I haven't spotted any change related to that, so I assume, correct me if I'm wrong, that if we were to deploy the feature and cache a step at this very moment, logs would be empty.

rui5i · 2020-03-31T20:22:59Z

Hi again,
nice job with the implementation so far! Keep it up!

I wanted to ask, are there any plans on what will you be showing as step's logs in KFP UI? Or the cached steps will appear with empty logs?
I haven't spotted any change related to that, so I assume, correct me if I'm wrong, that if we were to deploy the feature and cache a step at this very moment, logs would be empty.

Hi @elikatsis ,

Thanks for checking in! Currently, if a step's result is taken from cache, then the step log will show "This step output is taken from cache." https://github.com/kubeflow/pipelines/blob/master/backend/src/cache/server/mutation.go#L119. In the future we may explore if it's possible to show the link of previous run/step.

rui5i · 2020-04-10T18:38:08Z

Kubeflow Pipelines step caching is now released in 0.4.0 and after. Close this issue.

Bobgy · 2020-04-15T03:33:34Z

/reopen

TODO:

Add an UI indication a step is cached [UI] Show cached steps #3602
/assign @Bobgy

I will finish the UI integration

k8s-ci-robot · 2020-04-15T03:33:39Z

@Bobgy: Reopened this issue.

In response to this:

/reopen

TODO:

Add an UI indication a step is cached
/assign @Bobgy

I will finish the UI integration

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Bobgy · 2020-04-23T06:43:33Z

@rui5i @Ark-kun I took a look at a cached pod and read through the mutation hook's code, it seems we don't have any information UI can use to indicate a pod is cached. Is that correct?

If so, where should we add these information?

Ark-kun · 2020-04-24T01:55:24Z

Looks like we did not add any extra labels, so the labels/annotations of reused pods are same.
We could add a few annotations here: https://github.com/kubeflow/pipelines/blob/master/backend/src/cache/server/mutation.go#L126 For example we can include the original start time and end time.

However, given a pod, it's pretty easy to detect that it was skipped (it does not have any Argo containers):

pipelines/backend/src/cache/server/mutation.go

Line 133 in c4fb794

Command: []string{`echo`, `"This step output is taken from cache."`},

There is also a way to detect the skipped pods based on the WorkflowStatus alone: All output artifacts have the pod name in the URI. But for skipped pods, the pod name does not match the URIs. This genius idea belongs to @rui5i. (And now there are always some output artifacts since we've enabled log archiving).

Bobgy · 2020-04-24T03:48:42Z

@Ark-kun I'm a little worried about if we are relying too much on argo details:

we need to let UI understand argo artifact path structure, there is a part that's the workflow name
UI needs to understand cache server modifies argo outputs to reuse past outputs
There are no other systems modifying argo outputs

The last two points probably is also dangerous to cache server...

It really sounds to me we should contribute the caching solution to argo workflow natively and add it to workflow status. (just personal gut feeling, I guess that's not practical now)

In the mean time, I'll let UI use the hack first to first get it working.

Bobgy · 2020-04-24T04:30:02Z

Digging through some related argo issues, I understand argo isn't really taking the feature request.

Bobgy · 2020-04-24T04:39:04Z

@Ark-kun Can I assume we only cache successful steps?

Ark-kun · 2020-04-29T19:30:13Z

It really sounds to me we should contribute the caching solution to argo workflow natively and add it to workflow status. (just personal gut feeling, I guess that's not practical now)

I'm positive about that. Half a year ago I even started the project to add caching to Argo, but did not finish it. The caching requires some persistence and Argo did not have any DB at that time.

@Ark-kun Can I assume we only cache successful steps?

We only reuse the successfull steps.

In the future we'll start reusing still-running steps.

k8s-ci-robot assigned rui5i Jan 23, 2020

rmgogogo added status/triaged Whether the issue has been explicitly triaged kind/feature labels Feb 6, 2020

rui5i changed the title ~~[FeatureRequest]KFP retry on certain step~~ [FeatureRequest]KFP execution cache Feb 6, 2020

rui5i mentioned this issue Feb 11, 2020

[Backend]Initial execution cache #3036

Merged

rui5i mentioned this issue Feb 12, 2020

[Backend] Cache - pod filtering #3065

Merged

rui5i mentioned this issue Mar 13, 2020

[Backend]Cache - Cache logic with db interaction #3266

Merged

Bobgy mentioned this issue Mar 27, 2020

FR: caching to avoid long-running steps repetitively #1648

Closed

rui5i mentioned this issue Mar 27, 2020

[Manifest] Cache - Enable cache and cache deployer in base kustomization file #3376

Merged

rui5i closed this as completed Apr 10, 2020

k8s-ci-robot assigned Bobgy Apr 15, 2020

k8s-ci-robot reopened this Apr 15, 2020

Bobgy added the area/execution_cache label Apr 15, 2020

Bobgy mentioned this issue Apr 24, 2020

[UI] Show cached steps #3602

Merged

k8s-ci-robot closed this as completed in #3602 Apr 24, 2020

Ark-kun self-assigned this Apr 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FeatureRequest]KFP execution cache #2904

[FeatureRequest]KFP execution cache #2904

rui5i commented Jan 23, 2020 •

edited

Loading

rui5i commented Jan 23, 2020

rmgogogo commented Feb 6, 2020

rui5i commented Feb 6, 2020

elikatsis commented Feb 11, 2020

rui5i commented Feb 11, 2020

elikatsis commented Feb 18, 2020

Padarn commented Mar 27, 2020

rui5i commented Mar 27, 2020

Padarn commented Mar 27, 2020

elikatsis commented Mar 30, 2020

rui5i commented Mar 31, 2020 •

edited

Loading

rui5i commented Apr 10, 2020

Bobgy commented Apr 15, 2020 •

edited

Loading

k8s-ci-robot commented Apr 15, 2020

Bobgy commented Apr 23, 2020

Ark-kun commented Apr 24, 2020

Bobgy commented Apr 24, 2020 •

edited

Loading

Bobgy commented Apr 24, 2020

Bobgy commented Apr 24, 2020

Ark-kun commented Apr 29, 2020

[FeatureRequest]KFP execution cache #2904

[FeatureRequest]KFP execution cache #2904

Comments

rui5i commented Jan 23, 2020 • edited Loading

rui5i commented Jan 23, 2020

rmgogogo commented Feb 6, 2020

rui5i commented Feb 6, 2020

elikatsis commented Feb 11, 2020

rui5i commented Feb 11, 2020

elikatsis commented Feb 18, 2020

Padarn commented Mar 27, 2020

rui5i commented Mar 27, 2020

Padarn commented Mar 27, 2020

elikatsis commented Mar 30, 2020

rui5i commented Mar 31, 2020 • edited Loading

rui5i commented Apr 10, 2020

Bobgy commented Apr 15, 2020 • edited Loading

k8s-ci-robot commented Apr 15, 2020

Bobgy commented Apr 23, 2020

Ark-kun commented Apr 24, 2020

Bobgy commented Apr 24, 2020 • edited Loading

Bobgy commented Apr 24, 2020

Bobgy commented Apr 24, 2020

Ark-kun commented Apr 29, 2020

rui5i commented Jan 23, 2020 •

edited

Loading

rui5i commented Mar 31, 2020 •

edited

Loading

Bobgy commented Apr 15, 2020 •

edited

Loading

Bobgy commented Apr 24, 2020 •

edited

Loading