Skip to content
This repository has been archived by the owner on Sep 19, 2022. It is now read-only.

Migrate to new test-infra #316

Merged
merged 5 commits into from
Jan 26, 2021
Merged

Migrate to new test-infra #316

merged 5 commits into from
Jan 26, 2021

Conversation

PatrickXYS
Copy link
Member

Which issue is resolved by this Pull Request:
Part of kubeflow/testing#861

Description of your changes:
Start from migrating pytorch-operator first

@PatrickXYS
Copy link
Member Author

/test ?

4 similar comments
@PatrickXYS
Copy link
Member Author

/test ?

@PatrickXYS
Copy link
Member Author

/test ?

@PatrickXYS
Copy link
Member Author

/test ?

@PatrickXYS
Copy link
Member Author

/test ?

@aws-kf-ci-bot
Copy link

@PatrickXYS: The following commands are available to trigger jobs:

  • /test kubeflow-pytorch-operator-presubmit

Use /test all to run all jobs.

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@PatrickXYS
Copy link
Member Author

/test kubeflow-pytorch-operator-presubmit

@PatrickXYS
Copy link
Member Author

/test ?

@aws-kf-ci-bot
Copy link

@PatrickXYS: The following commands are available to trigger jobs:

  • /test kubeflow-pytorch-operator-presubmit

Use /test all to run all jobs.

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@PatrickXYS
Copy link
Member Author

/test kubeflow-pytorch-operator-presubmit

@Jeffwan
Copy link
Member

Jeffwan commented Jan 25, 2021

@PatrickXYS What's the recent change?

@PatrickXYS
Copy link
Member Author

@Jeffwan There's no change in testing itself, only migrate from old test-infra to new test-infra.

@PatrickXYS
Copy link
Member Author

/test kubeflow-pytorch-operator-presubmit

@PatrickXYS
Copy link
Member Author

  Warning  Failed     104s   kubelet            Error: failed to start container "test": Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: setenv: invalid argument: unknown

Seems like it's related to encoded secret opencontainers/runc#1720

@PatrickXYS
Copy link
Member Author

/test kubeflow-pytorch-operator-presubmit

@PatrickXYS
Copy link
Member Author

PatrickXYS commented Jan 25, 2021

handle object: patching object from cluster: merging object with existing state: unable to recognize \"/tmp/ksonnet-mergepatch992297811\": no matches for kind \"Workflow\" in version \"argoproj.io/v1alpha1\""
subprocess.CalledProcessError: Command 'cmd: ks-13 apply kubeflow-pytorch-operator-presubmit-e2e-316-b2e6905-7312-e89b -c workflows exited with code 1' returned non-zero exit status 1.

Argo workflow has been generated, and apply failed to work. Checking

@PatrickXYS
Copy link
Member Author

PatrickXYS commented Jan 25, 2021

Not sure if this is one-off error, kicking off to see how it goes

I remembered that I saw this error previously, but don't remember what I did before to get rid of it

/test kubeflow-pytorch-operator-presubmit

@PatrickXYS
Copy link
Member Author

PatrickXYS commented Jan 25, 2021

I see, it should be caused by IAM role does not have access to the argo cluster

# kubectl get ns
error: You must be logged in to the server (Unauthorized)

@PatrickXYS
Copy link
Member Author

Succeeded in creating Argo workflow in argo cluster

# ks-13 apply pytorch-test-e2e-316-b2e6905-9e68 -c workflows
INFO Applying workflows kubeflow-test-infra.pytorch-test-e2e-316-b2e6905-9e68
INFO Creating non-existent workflows kubeflow-test-infra.pytorch-test-e2e-316-b2e6905-9e68

@PatrickXYS
Copy link
Member Author

PatrickXYS commented Jan 25, 2021

@Jeffwan I forgot to mention that we should help WG create ECR registry in new AWS account in advance

Also found that test-worker image 527798164940.dkr.ecr.us-west-2.amazonaws.com/aws-kubeflow-ci/test-worker:latest is hard-coded into WG's repo.

Need to gather as a list and then migrate

@PatrickXYS
Copy link
Member Author

/test kubeflow-pytorch-operator-presubmit

@PatrickXYS
Copy link
Member Author

This travis-ci is queued for resources... It might be better to use GitHub actions for simple tests

@Jeffwan Jeffwan self-assigned this Jan 25, 2021
@PatrickXYS
Copy link
Member Author

PatrickXYS commented Jan 25, 2021

/hold

Let's hold for more experiments

@coveralls
Copy link

coveralls commented Jan 25, 2021

Coverage Status

Coverage increased (+0.1%) to 64.23% when pulling 4fa3260 on PatrickXYS:migrate into 282cbee on kubeflow:master.

@PatrickXYS
Copy link
Member Author

/test kubeflow-pytorch-operator-presubmit

@PatrickXYS
Copy link
Member Author

Not sure why the argo_endpoint still point to the old one, checked it locally and it's referring to new one in my local machine.

image

@PatrickXYS
Copy link
Member Author

/test kubeflow-pytorch-operator-presubmit

@PatrickXYS
Copy link
Member Author

I see, the run_workflow.sh is embedded into image

https://github.com/kubeflow/testing/blob/master/images/Dockerfile.py3.aws#L83-L84

Need to rebuild the test-worker image

@PatrickXYS
Copy link
Member Author

/test kubeflow-pytorch-operator-presubmit

@PatrickXYS
Copy link
Member Author

It works after re-build test-worker image

image

@PatrickXYS
Copy link
Member Author

/cc @Jeffwan

Should be good to go

@Jeffwan
Copy link
Member

Jeffwan commented Jan 26, 2021

/lgtm
/approve

@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Jeffwan, PatrickXYS

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@PatrickXYS
Copy link
Member Author

/unhold

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants