Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mount_pvc failing when pipeline includes ResourceOps #3906

Closed
jackwhelpton opened this issue Jun 3, 2020 · 9 comments
Closed

mount_pvc failing when pipeline includes ResourceOps #3906

jackwhelpton opened this issue Jun 3, 2020 · 9 comments
Labels
area/sdk help wanted The community is welcome to contribute. kind/bug status/triaged Whether the issue has been explicitly triaged

Comments

@jackwhelpton
Copy link
Contributor

What steps did you take:

Created a persistent volume claim using a VolumeOp:

    vop = dsl.VolumeOp(
        name="shared-data",
        resource_name="shared-data",
        size="100G",
        storage_class="nfs",
    ).set_display_name("Prepare volume")

Attempted to mount the PVC to every operation in a pipeline using:

    dsl.get_pipeline_conf().add_op_transformer(
        mount_pvc(pvc_name=vop.volume.persistent_volume_claim.claim_name, volume_mount_path=MOUNT_PATH)
    )

What happened:

During compilation, the add_op_transformer attempts to invoke the add_volume_mount method on the VolumeOp, producing an error:

AttributeError: 'VolumeOp' object has no attribute 'add_volume_mount'

What did you expect to happen:

The documentation states that transfomers take and return a ContainerOp:

https://kubeflow-pipelines.readthedocs.io/en/latest/source/kfp.dsl.html#kfp.dsl.PipelineConf.add_op_transformer

Given that, I would expect them to be bypassed for VolumeOp (or other ResourceOp operations), and hence this compilation should succeed.

Environment:

KFP version: Build commit: 9c16e12

KFP SDK version:
kfp 0.5.1
kfp-server-api 0.3.0

Anything else you would like to add:

[Miscellaneous information that will assist in solving the issue.]

/kind bug

@elikatsis
Copy link
Member

Hi @jackwhelpton,

The documentation seems wrong there. Since there is commonality between the ops, it's the op transformer's job to check before applying something exclusive to a specific kind of ops.

Could you write a few-lines-wrapper around mount_pvc to do that check?
It should be something like

def mount_pvc_transformer(pvc_name, mount_path):
    def transformer(op):
        if isinstance(op, dsl.ContainerOp):
            mount_pvc(pvc_name=pvc_name, volume_mount_path=mount_path)
    return transformer

Also, instead of mount_pvc you could use the following, since mount_pvc can fail if you attempt to mount multiple volumes with it and you don't have advanced K8s knowledge.

op.add_pvolumes({mount_path: dsl.PipelineVolume(pvc=pvc_name)})

@jackwhelpton
Copy link
Contributor Author

Ah, that's useful, thanks! I was going to offer a PR to push that check into mount_pvc, but I notice you've done that too. Cheers.

Oddly enough I had been using add_pvoumes, but was starting to move away from that on the basis of #3385 (comment), specifically this bit:

Just note that pvolumes, add_pvolumes and VolumeOp are community contrib features and have some portability and support implications. The officially supported methods are add_volume and add_volume_mount.

What is the recommended approach here, then? It seems like best practice is still fairly diffused through docs, GitHub issues etc.

@elikatsis
Copy link
Member

elikatsis commented Jun 4, 2020

Yes, that opinion has been commented here and there in some issues..
Part of my answer here, and here addresses that opinion.

I don't think such thing as an unsupported feature on master branch of an OSS exists. Especially if we're talking for a method such as add_pvolumes().
Indeed, using PVCs and other resources in general may come with implications depending on the cluster. However, I don't think there has been issues left unsupported or without a workaround proposal.
Moreover, the "officially supported methods" require users to learn Kubernetes specifics, and we (Arrikto) have put effort on removing this requirement, as much as possible. The data scientist shouldn't know what a VolumeMount is, or what types of Volumes exist, or how can something be mounted on a step. Data scientists know this: a filesystem that can be called as foo (PVC name) should be found under a specific path.
For advanced stuff, the data scientist could learn K8s and get to use add_volume_mount with its full capabilities. When that's not the case, I don't get why add_pvolumes is unofficial.

Of course, Google team has the same goal, it's quite clear and there's no doubt about that. I mean look at KFP 😄
I can't know why there's such opinion on this specific matter.

@jackwhelpton
Copy link
Contributor Author

There's one other subtlety about using mount_pvc: it removes the explicit dependency between stages and the VolumeOp that creates the PVC. At the moment I've solved that by doing this:

def mount(vop):
    # mount_pvc wrapper ensures transformer only executes for ContainerOps.
    # This is required until https://github.com/kubeflow/pipelines/pull/3912 is merged.
    def transformer(op):
        if isinstance(op, dsl.ContainerOp):
            onprem.mount_pvc(
                pvc_name=vop.volume.persistent_volume_claim.claim_name,
                volume_mount_path=MOUNT_PATH,
            )
            op.after(vop)

    return transformer

which seems clunky (again, I've read that explicit dependencies between ops are a code smell). Is there a cleaner way to do this?

@elikatsis
Copy link
Member

Using add_pvolumes() with vop.volume will do what you need.
Essentially, PipelineVolumes aims to be an abstraction over Kubernetes volumes that are also aware of dependencies, just as PipelineParams are for primitive data types.
add_pvolumes() is the way to apply the dependencies carried by PipelineVolumes to the ops using them (but the method can also be used with other types of Kubernetes volumes even though they don't carry dependencies).

Something like this should work:

def mount(vop, mount_path):
    def transformer(op):
        if isinstance(op, dsl.ContainerOp):
            op.add_pvolumes({mount_path: vop.volume})
    return transformer

Can you verify?

@jackwhelpton
Copy link
Contributor Author

That's looking much better (I'll update this comment once it actually completes, which takes around 30m, but it didn't get this far before).

This seems to be a good argument for using add_pvolumes, as even adding that after I must have missed a dependency as the pipeline wasn't executing correctly. However, this also prevents me moving to the out-of-the-box mount_pvc even once your fix is merged, unless that helper function is updated to use add_pvolumes, doesn't it?

We're having a lot of discussions internally about how to make kubeflow usable for "pure" data scientists: I'm coming from much more of a traditional engineering background, but things that are straightforward for me are often not so for our data science team (and vice versa, obviously). I'm interested to read anything Google have written on how they address this internally...

@elikatsis
Copy link
Member

Firstly, it wasn't me who opened the PR. Credits should go to @eterna2 😄

unless that helper function is updated to use add_pvolumes,

The usage of mount_pvc aims to more generic and simpler requests: mount this under that path. It does that fine, but it can lead to errors if you desire multiple mounts.
It's purpose is not really connected to PipelineVolumes usage.

PipelineVolumes can be exploited when used in live DSL. I mean not via transformers which apply a static anything to ops.
Think of PipelineVolumes as output parameters of steps, but they may imply multiple dependencies.
You may take a look at these examples: 1, 2

@jackwhelpton
Copy link
Contributor Author

All looks good using add_pvolumes as a transformer, ta. I use a PipelineVolume in this scenario because one of my stages is a docker container that makes some very specific assumptions about where to find its data: to wit, it spawns sub-workers from a predefined YAML spec that take their inputs from a volume mount therein.

At some point I want to migrate what is effectively a hand-wired pipeline to kubeflow itself, but as an intermediate step I'm changing that YAML on-the-fly so it mounts a PVC that the pipeline sets up in advance. Actually, even if I were to migrate to more of a pure Kubeflow solution, I think I'd still be looking at a volume mount to fan-in results from a ParallelFor. There are several issues for this, e.g. #3412.

Given the dependency problem, it sounds like mount_pvc is a good helper when a static mount is being attached, but that a custom transformer along the lines of the one above would always be required to mount a volume that was created at runtime (via a VolumeOp). Fair summary?

@elikatsis
Copy link
Member

Yep, sounds correct!

@jackwhelpton jackwhelpton changed the title add_op_transformer failing with ResourceOps mount_pvc failing when pipeline includes ResourceOps Jun 4, 2020
@rmgogogo rmgogogo added status/triaged Whether the issue has been explicitly triaged area/sdk help wanted The community is welcome to contribute. labels Jun 5, 2020
Bobgy pushed a commit that referenced this issue Jun 16, 2020
* Fix #3906 - check that ops to be transformed is a containerOp

* Update docstring for add_op_transformer to clarify that not only containerOp will be transformed.
RedbackThomson pushed a commit to RedbackThomson/pipelines that referenced this issue Jun 17, 2020
kubeflow#3912)

* Fix kubeflow#3906 - check that ops to be transformed is a containerOp

* Update docstring for add_op_transformer to clarify that not only containerOp will be transformed.
Jeffwan pushed a commit to Jeffwan/pipelines that referenced this issue Dec 9, 2020
kubeflow#3912)

* Fix kubeflow#3906 - check that ops to be transformed is a containerOp

* Update docstring for add_op_transformer to clarify that not only containerOp will be transformed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sdk help wanted The community is welcome to contribute. kind/bug status/triaged Whether the issue has been explicitly triaged
Projects
None yet
Development

No branches or pull requests

4 participants