-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is there a way to disable cache on a specific pipeline (created through component yml) #4857
Comments
Have you tried set max_cache_staleness to 0 on certain step? https://www.kubeflow.org/docs/pipelines/caching/#managing-caching-staleness |
How can i specify it on a yml pipeline? i didnt found an example. My yaml file:
Thanks in advance, |
Can you try to add "pipelines.kubeflow.org/cache_enabled:false" to your pipeline yaml's labels and see if this works? |
I did as you suggested, and the pipeline was loaded as expected, but when i try to run it throws the following error:
|
Can you also try to set "pipelines.kubeflow.org/max_cache_staleness: 'P0D'" on the yaml annotations and remove the labels? |
The nodes keep using the cached executions, even after the modifications |
Hi Alexey, can you help take a look at this issue? /assign @Ark-kun |
@cabjr Hello.
Please try this and tell us whether this helps. The format of produced workflow files is an implementation detail and is subject to change. - name: some-name
metadata:
annotations:
"pipelines.kubeflow.org/max_cache_staleness": P0D
container: ... |
P.S. I've noticed that your pipeline does not use any data passing. I se no components, not inputs and outputs and no argument passing. System-managed data passing is one of the most important features of KFP and is important for getting value. The caching system relies on the data passing information to decide when to reuse an execution (the cached value are reused when all input arguments are the same and the component is the same). Perhaps you can create KFP components with inputs and outputs for your pipeline steps and create a pipeline where they pass data explicitly. Then the caching will start working better for you without needing tweaks. Please check the following tutorial: https://github.com/Ark-kun/kfp_samples/blob/ae1a5b6/2019-10%20Kubeflow%20summit/106%20-%20Creating%20components%20from%20command-line%20programs/106%20-%20Creating%20components%20from%20command-line%20programs.ipynb |
Hi @Ark-kun , I've tried adding the "pipelines.kubeflow.org/max_cache_staleness": P0D specification inside annotations, but it doesnt seem to work. I cannot specify the "task_never_use_cache.execution_options.caching_strategy.max_cache_staleness = "P0D" because my pipelines are generated (yaml) from a kedro pipeline (ml framework for experimentation), thats the same reason as why i cannot specify/use the data inputs/outputs from the kubeflow pipelines itself, internally my image already uses a data catalog that points out to GCS and BigQuery. Thats why i'm trying to disable the cache behavior. The main idea about my project is to allow the DS team to prototype with kedro, then deploy it in kubeflow pipelines with the minimal (none if possible) modification. As it is suposed to be running in recurring runs (jobs), the cache behavior is a problem for us. |
@cabjr you might want to follow instructions in https://www.kubeflow.org/docs/pipelines/caching/#disabling-caching-in-your-kubeflow-pipelines-deployment to disable caching for your KFP instance, so that all pipelines are not cached. |
but reminder that running arbitrary argo workflows with KFP may not keep working in the future, KFP has its own sdk for building workflows. |
I've done as Bobgy suggested, disabled cache for the entire KFP Instance, might not be the ideal solution but it works as expected, Thanks for the help |
disabling cache for pipelines v2 def some_pipeline():
# task is a target step in a pipeline
task_never_use_cache = some_op()
task_never_use_cache.enable_caching = False |
For those running into this and going insane because even though you've disabled caching the cache executions are still appearing... there's a bug in the SDK. See my comment here: #10966 (comment) |
What did you expect to happen:
Is there a way to disable cache on a specific pipeline (created through component yml) using Kubeflow Pipelines on GCP?
I have a pipeline that must run once in a week, because of the cache behavior some nodes are not being executed again, the inputs / parameters are always the same, although, internally it does a Select inside Big Query (which will get the updated data to preprocess). If i can disable this behavior it would work as expected.
PS: I've tried this steps: https://www.kubeflow.org/docs/pipelines/caching/ but they didnt worked out with GCP Pipelines.
Any ideas?
Environment:
Google Cloud Platform
How did you deploy Kubeflow Pipelines (KFP)?
Through Google Cloud Platform ( AI Platform -> Pipelines)
/kind question
The text was updated successfully, but these errors were encountered: