mount_pvc failing when pipeline includes ResourceOps #3906

jackwhelpton · 2020-06-03T23:13:43Z

What steps did you take:

Created a persistent volume claim using a VolumeOp:

    vop = dsl.VolumeOp(
        name="shared-data",
        resource_name="shared-data",
        size="100G",
        storage_class="nfs",
    ).set_display_name("Prepare volume")

Attempted to mount the PVC to every operation in a pipeline using:

    dsl.get_pipeline_conf().add_op_transformer(
        mount_pvc(pvc_name=vop.volume.persistent_volume_claim.claim_name, volume_mount_path=MOUNT_PATH)
    )

What happened:

During compilation, the add_op_transformer attempts to invoke the add_volume_mount method on the VolumeOp, producing an error:

AttributeError: 'VolumeOp' object has no attribute 'add_volume_mount'

What did you expect to happen:

The documentation states that transfomers take and return a ContainerOp:

https://kubeflow-pipelines.readthedocs.io/en/latest/source/kfp.dsl.html#kfp.dsl.PipelineConf.add_op_transformer

Given that, I would expect them to be bypassed for VolumeOp (or other ResourceOp operations), and hence this compilation should succeed.

Environment:

KFP version: Build commit: 9c16e12

KFP SDK version:
kfp 0.5.1
kfp-server-api 0.3.0

Anything else you would like to add:

[Miscellaneous information that will assist in solving the issue.]

/kind bug

The text was updated successfully, but these errors were encountered:

elikatsis · 2020-06-04T09:25:44Z

Hi @jackwhelpton,

The documentation seems wrong there. Since there is commonality between the ops, it's the op transformer's job to check before applying something exclusive to a specific kind of ops.

Could you write a few-lines-wrapper around mount_pvc to do that check?
It should be something like

def mount_pvc_transformer(pvc_name, mount_path):
    def transformer(op):
        if isinstance(op, dsl.ContainerOp):
            mount_pvc(pvc_name=pvc_name, volume_mount_path=mount_path)
    return transformer

Also, instead of mount_pvc you could use the following, since mount_pvc can fail if you attempt to mount multiple volumes with it and you don't have advanced K8s knowledge.

op.add_pvolumes({mount_path: dsl.PipelineVolume(pvc=pvc_name)})

jackwhelpton · 2020-06-04T16:32:39Z

Ah, that's useful, thanks! I was going to offer a PR to push that check into mount_pvc, but I notice you've done that too. Cheers.

Oddly enough I had been using add_pvoumes, but was starting to move away from that on the basis of #3385 (comment), specifically this bit:

Just note that pvolumes, add_pvolumes and VolumeOp are community contrib features and have some portability and support implications. The officially supported methods are add_volume and add_volume_mount.

What is the recommended approach here, then? It seems like best practice is still fairly diffused through docs, GitHub issues etc.

elikatsis · 2020-06-04T16:51:13Z

Yes, that opinion has been commented here and there in some issues..
Part of my answer here, and here addresses that opinion.

I don't think such thing as an unsupported feature on master branch of an OSS exists. Especially if we're talking for a method such as add_pvolumes().
Indeed, using PVCs and other resources in general may come with implications depending on the cluster. However, I don't think there has been issues left unsupported or without a workaround proposal.
Moreover, the "officially supported methods" require users to learn Kubernetes specifics, and we (Arrikto) have put effort on removing this requirement, as much as possible. The data scientist shouldn't know what a VolumeMount is, or what types of Volumes exist, or how can something be mounted on a step. Data scientists know this: a filesystem that can be called as foo (PVC name) should be found under a specific path.
For advanced stuff, the data scientist could learn K8s and get to use add_volume_mount with its full capabilities. When that's not the case, I don't get why add_pvolumes is unofficial.

Of course, Google team has the same goal, it's quite clear and there's no doubt about that. I mean look at KFP 😄
I can't know why there's such opinion on this specific matter.

jackwhelpton · 2020-06-04T18:30:59Z

There's one other subtlety about using mount_pvc: it removes the explicit dependency between stages and the VolumeOp that creates the PVC. At the moment I've solved that by doing this:

def mount(vop):
    # mount_pvc wrapper ensures transformer only executes for ContainerOps.
    # This is required until https://github.com/kubeflow/pipelines/pull/3912 is merged.
    def transformer(op):
        if isinstance(op, dsl.ContainerOp):
            onprem.mount_pvc(
                pvc_name=vop.volume.persistent_volume_claim.claim_name,
                volume_mount_path=MOUNT_PATH,
            )
            op.after(vop)

    return transformer

which seems clunky (again, I've read that explicit dependencies between ops are a code smell). Is there a cleaner way to do this?

elikatsis · 2020-06-04T18:50:40Z

Using add_pvolumes() with vop.volume will do what you need.
Essentially, PipelineVolumes aims to be an abstraction over Kubernetes volumes that are also aware of dependencies, just as PipelineParams are for primitive data types.
add_pvolumes() is the way to apply the dependencies carried by PipelineVolumes to the ops using them (but the method can also be used with other types of Kubernetes volumes even though they don't carry dependencies).

Something like this should work:

def mount(vop, mount_path):
    def transformer(op):
        if isinstance(op, dsl.ContainerOp):
            op.add_pvolumes({mount_path: vop.volume})
    return transformer

Can you verify?

jackwhelpton · 2020-06-04T19:05:17Z

That's looking much better (I'll update this comment once it actually completes, which takes around 30m, but it didn't get this far before).

This seems to be a good argument for using add_pvolumes, as even adding that after I must have missed a dependency as the pipeline wasn't executing correctly. However, this also prevents me moving to the out-of-the-box mount_pvc even once your fix is merged, unless that helper function is updated to use add_pvolumes, doesn't it?

We're having a lot of discussions internally about how to make kubeflow usable for "pure" data scientists: I'm coming from much more of a traditional engineering background, but things that are straightforward for me are often not so for our data science team (and vice versa, obviously). I'm interested to read anything Google have written on how they address this internally...

elikatsis · 2020-06-04T19:20:49Z

Firstly, it wasn't me who opened the PR. Credits should go to @eterna2 😄

unless that helper function is updated to use add_pvolumes,

The usage of mount_pvc aims to more generic and simpler requests: mount this under that path. It does that fine, but it can lead to errors if you desire multiple mounts.
It's purpose is not really connected to PipelineVolumes usage.

PipelineVolumes can be exploited when used in live DSL. I mean not via transformers which apply a static anything to ops.
Think of PipelineVolumes as output parameters of steps, but they may imply multiple dependencies.
You may take a look at these examples: 1, 2

jackwhelpton · 2020-06-04T19:44:57Z

All looks good using add_pvolumes as a transformer, ta. I use a PipelineVolume in this scenario because one of my stages is a docker container that makes some very specific assumptions about where to find its data: to wit, it spawns sub-workers from a predefined YAML spec that take their inputs from a volume mount therein.

At some point I want to migrate what is effectively a hand-wired pipeline to kubeflow itself, but as an intermediate step I'm changing that YAML on-the-fly so it mounts a PVC that the pipeline sets up in advance. Actually, even if I were to migrate to more of a pure Kubeflow solution, I think I'd still be looking at a volume mount to fan-in results from a ParallelFor. There are several issues for this, e.g. #3412.

Given the dependency problem, it sounds like mount_pvc is a good helper when a static mount is being attached, but that a custom transformer along the lines of the one above would always be required to mount a volume that was created at runtime (via a VolumeOp). Fair summary?

elikatsis · 2020-06-04T19:58:11Z

Yep, sounds correct!

* Fix #3906 - check that ops to be transformed is a containerOp * Update docstring for add_op_transformer to clarify that not only containerOp will be transformed.

kubeflow#3912) * Fix kubeflow#3906 - check that ops to be transformed is a containerOp * Update docstring for add_op_transformer to clarify that not only containerOp will be transformed.

k8s-ci-robot added the kind/bug label Jun 3, 2020

eterna2 added a commit to e2forks/pipelines that referenced this issue Jun 4, 2020

Fix kubeflow#3906 - check that ops to be transformed is a containerOp

5e3a191

eterna2 mentioned this issue Jun 4, 2020

Fix #3906 - mount_pvc transform should ignore non-ContainerOps #3912

Merged

jackwhelpton changed the title ~~add_op_transformer failing with ResourceOps~~ mount_pvc failing when pipeline includes ResourceOps Jun 4, 2020

rmgogogo added status/triaged Whether the issue has been explicitly triaged area/sdk help wanted The community is welcome to contribute. labels Jun 5, 2020

k8s-ci-robot closed this as completed in 4812d35 Jun 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mount_pvc failing when pipeline includes ResourceOps #3906

mount_pvc failing when pipeline includes ResourceOps #3906

jackwhelpton commented Jun 3, 2020

elikatsis commented Jun 4, 2020

jackwhelpton commented Jun 4, 2020

elikatsis commented Jun 4, 2020 •

edited

Loading

jackwhelpton commented Jun 4, 2020

elikatsis commented Jun 4, 2020

jackwhelpton commented Jun 4, 2020

elikatsis commented Jun 4, 2020

jackwhelpton commented Jun 4, 2020

elikatsis commented Jun 4, 2020

mount_pvc failing when pipeline includes ResourceOps #3906

mount_pvc failing when pipeline includes ResourceOps #3906

Comments

jackwhelpton commented Jun 3, 2020

What steps did you take:

What happened:

What did you expect to happen:

Environment:

Anything else you would like to add:

elikatsis commented Jun 4, 2020

jackwhelpton commented Jun 4, 2020

elikatsis commented Jun 4, 2020 • edited Loading

jackwhelpton commented Jun 4, 2020

elikatsis commented Jun 4, 2020

jackwhelpton commented Jun 4, 2020

elikatsis commented Jun 4, 2020

jackwhelpton commented Jun 4, 2020

elikatsis commented Jun 4, 2020

elikatsis commented Jun 4, 2020 •

edited

Loading