Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sdk] Volume created by CreatePVC is not mounted to tasks in dsl.ParallelFor #10243

Closed
vereba opened this issue Nov 16, 2023 · 2 comments
Closed
Labels
area/sdk kind/bug lifecycle/stale The issue / pull request is stale, any activities remove this label.

Comments

@vereba
Copy link

vereba commented Nov 16, 2023

Environment

KFP version:
2.0.3 (manifests v1.8 release)
KFP SDK version:

kfp                      2.4.0
kfp-kubernetes           1.0.0
kfp-pipeline-spec        0.2.2
kfp-server-api           2.0.3

Steps to reproduce

Given the following setting:

  • Create a volume within the pipeline using kfp.kubernetes.CreatePVC()
  • Producing task writing a file to the volume (with a volume mounted)
  • Consuming tasks within a dsl.ParallelFor reading a file from a volume (with a volume mounted)

Minimal example to reproduce

from kfp import dsl
from kfp import kubernetes as kfpk8s

@dsl.component(packages_to_install=["scikit-learn", "pandas"])
def load(save_dir: str):
    from sklearn import datasets
    import pandas as pd

    data = datasets.load_iris()
    df = pd.DataFrame(
        data=data.data,
        columns=["Petal Length", "Petal Width", "Sepal Length", "Sepal Width"],
    )
    df.to_csv(save_dir)


@dsl.component(packages_to_install=["pandas"])
def print_results(load_dir: str):
    import pandas as pd
    
    data = pd.read_csv(load_dir)
    print(data)
   

@dsl.pipeline(
    name="Data Passing Using Temporary Volumes",
    description="A pipeline showing how to pass data using temporary volumes.",
)
def data_passing_with_temp_volume():
    create_pvc = kfpk8s.CreatePVC(
        pvc_name_suffix="-data-passing-volume",
        access_modes=["ReadWriteMany"],
        size="5Mi",
        storage_class_name="default",
    )

    mount_path = "/data-passing"
    file_name = f"{mount_path}/file.csv"

    load_task = load(save_dir=file_name)
    kfpk8s.mount_pvc(task=load_task, pvc_name=create_pvc.outputs["name"], mount_path=mount_path)
    load_task.set_caching_options(enable_caching=False)

    # printing outside ParallelFor works
    printing_task_outside = print_results(load_dir=file_name)
    printing_task_outside.set_caching_options(enable_caching=False)
    printing_task_outside.after(load_task)
    kfpk8s.mount_pvc(task=printing_task_outside, pvc_name=create_pvc.outputs["name"], mount_path=mount_path)
    
    with kfp.dsl.ParallelFor([1,2,3]) as num:
        printing_task = print_results(load_dir=file_name)
        printing_task.after(load_task)
        printing_task.set_caching_options(enable_caching=False)
        kfpk8s.mount_pvc(task=printing_task, pvc_name=create_pvc.outputs["name"], mount_path=mount_path)


data_passing_with_temp_volume_result = client.create_run_from_pipeline_func(
    data_passing_with_temp_volume,
    arguments={},
    experiment_name="data-passing-using-temporary-volumes",
    namespace="your-ns",
    # Disable caching for example pipelines
    # currently, cannot disable due to https://github.com/kubeflow/pipelines/issues/10188
    #enable_caching=False,
)

Error

This scenario creates the following error in the logs (of component within ParallelFor):
KFP driver: driver.Container(pipelineName=data-passing-using-temporary-volumes, runID=c38d8eaa-baec-4d3c-95de-580db80099de, task="print-results-2", component="comp-print-results-2", dagExecutionID=14531, componentSpec, KubernetesExecutorConfig) failed: failed to extract volume mount info: failed to make podSpecPatch: volume mount: cannot find producer task createpvc

Expected result

Volume should be mounted to all tasks.
It works, if the PVC of an already existing volume is used.

Impacted by this bug? Give it a 👍.

@vereba vereba changed the title [sdk] Volume created by CreatePVC is not moutned to tasks in dsl.ParallelFor [sdk] Volume created by CreatePVC is not mounted to tasks in dsl.ParallelFor Nov 16, 2023
Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Feb 15, 2024
Copy link

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sdk kind/bug lifecycle/stale The issue / pull request is stale, any activities remove this label.
Projects
Status: Closed
Development

No branches or pull requests

1 participant