Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing - post submit tests are failing #4046

Closed
Bobgy opened this issue Jun 22, 2020 · 10 comments
Closed

Testing - post submit tests are failing #4046

Bobgy opened this issue Jun 22, 2020 · 10 comments
Assignees
Labels
area/testing status/triaged Whether the issue has been explicitly triaged

Comments

@Bobgy
Copy link
Contributor

Bobgy commented Jun 22, 2020

https://github.com/kubeflow/pipelines/commits/master

we need to fix it before 1.0 release

@Bobgy Bobgy self-assigned this Jun 22, 2020
@Bobgy
Copy link
Contributor Author

Bobgy commented Jun 22, 2020

/cc @rmgogogo @jingzhang36

@Bobgy Bobgy added the status/triaged Whether the issue has been explicitly triaged label Jun 22, 2020
@Bobgy
Copy link
Contributor Author

Bobgy commented Jul 1, 2020

We'd better enable postsubmit tests for release branches too: kubernetes/test-infra#18131

@Bobgy
Copy link
Contributor Author

Bobgy commented Jul 1, 2020

googleapiclient.errors.HttpError: <HttpError 429 when requesting https://ml.googleapis.com/v1/projects/ml-pipeline-test/models?alt=json returned "Quota failure for project ml-pipeline-test. Quota failure for project 363997316495. Exceeded the max allowed number of models per project: 100. To read more about Cloud ML Engine quota, see https://cloud.google.com/ml-engine/quotas. To read more about Cloud ML Engine quota, see https://cloud.google.com/ml-engine/quotas.">

Found root cause for integration test, I think we are reaching model count quota again.

@Bobgy
Copy link
Contributor Author

Bobgy commented Jul 1, 2020

Another root cause for other two postsubmit tests:

+ for i in $(seq 1 ${PULL_CLOUDBUILD_STATUS_MAX_ATTEMPT})
++ gcloud builds list --project=ml-pipeline-test --filter=sourceProvenance.resolvedRepoSource.commitSha:79e0ee2b492e438e54690a4b5a92b7cce5298497
Listed 0 items.
+ output=
+ [[ '' != '' ]]
+ sleep 20
+ [[ TIMEOUT == TIMEOUT ]]
+ echo 'Wait for cloudbuild job to start, timeout exiting...'
Wait for cloudbuild job to start, timeout exiting...
+ exit 1

looks like the gcloud build filter no longer works.

EDIT: by looking back in history, the error started when I configured to use GitHub App triggers instead of just GitHub triggers: https://cloud.google.com/cloud-build/docs/automating-builds/create-github-app-triggers.

Sent a PR to fix this: #4122.

@Bobgy
Copy link
Contributor Author

Bobgy commented Jul 1, 2020

@numerology a related question: why was the postsubmit integration test using presubmit test script https://github.com/kubernetes/test-infra/blob/27669402474b41125795a652f333379f2aa739e1/config/jobs/kubeflow/kubeflow-postsubmits.yaml#L301?

@Bobgy
Copy link
Contributor Author

Bobgy commented Jul 1, 2020

https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/kubeflow-pipeline-postsubmit-standalone-component-test/1278290475136061441#1:build-log.txt%3A4102

$ gsutil cat gs://dataproc-788e0848-9dd5-4ea6-9b78-eadcb7f5a23d-us-central1/google-cloud-dataproc-metainfo/885c29f6-a75d-46d6-8a74-7a152c7e0d2e/xgb-74af0310-b67d-47f9-a63d-65c1cb987ce3-w-1/dataproc-initialization-script-0_output
Searching for pip
Reading https://pypi.python.org/simple/pip/
Download error on https://pypi.python.org/simple/pip/: [Errno 101] Network is unreachable -- Some packages may not be found!
Couldn't find index page for 'pip' (maybe misspelled?)
Scanning index of all packages (this may take a while)
Reading https://pypi.python.org/simple/
No local packages or working download links found for pip
error: Could not find suitable distribution for Requirement.parse('pip')

@numerology Do you have any ideas of this error message? I remember you mentioned it could be because of the project/org policy. I can also reproduce this issue if testing in my own project, how can we fix it?

@Bobgy
Copy link
Contributor Author

Bobgy commented Jul 1, 2020

The error looks exactly like #3661 (comment).

It could probably be transient, but we should figure out a fix for it.

@Bobgy
Copy link
Contributor Author

Bobgy commented Jul 2, 2020

Postsubmit tests passing now on master!
2e14fe7

@Bobgy
Copy link
Contributor Author

Bobgy commented Jul 2, 2020

but postsubmit tests still fail on release-1.0 branch: 7a0df42

Also creating a PR to release-1.0 branch to verify presubmit: #4140

@Bobgy
Copy link
Contributor Author

Bobgy commented Jul 3, 2020

Verified XGboost sample also ran successfully in rc3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/testing status/triaged Whether the issue has been explicitly triaged
Projects
None yet
Development

No branches or pull requests

1 participant