Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Failed to synchronize cache for repo 'updates' #11452

Closed
csrwng opened this issue Oct 19, 2016 · 35 comments
Closed

Error: Failed to synchronize cache for repo 'updates' #11452

csrwng opened this issue Oct 19, 2016 · 35 comments
Assignees
Labels
area/tests kind/test-flake Categorizes issue or PR as related to test flakes. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/P2

Comments

@csrwng
Copy link
Contributor

csrwng commented Oct 19, 2016

Parent issue for discussion: #8571

====

Not sure if this is the same as the yum failures, but opening just in case it's different

https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_networking/277/

@csrwng csrwng added priority/P2 area/tests kind/test-flake Categorizes issue or PR as related to test flakes. labels Oct 19, 2016
@csrwng
Copy link
Contributor Author

csrwng commented Oct 19, 2016

@stevekuznetsov fyi

@marun
Copy link
Contributor

marun commented Oct 21, 2016

I've been seeing this failure regularly when I attempt to dnf -y update fedora24 images.

@marun
Copy link
Contributor

marun commented Oct 21, 2016

I've added failure cause 'dnf update failure' to jenkins.

@stevekuznetsov
Copy link
Contributor

This is either an internet connectivity issue or a mirror issue... @tdawson are we mirroring @updates internally somewhere we can use?

@tdawson
Copy link
Member

tdawson commented Oct 21, 2016

I believe this is a Fedora only issue. We are not mirroring any plain Fedora repo's, only EPEL.

@stevekuznetsov
Copy link
Contributor

I believe this is a Fedora only issue

Yes, this is true.

As these types of issues proliferate, I think we need a better strategy for interacting with mirrors in general. Can we reduce the yum traffic? When we're building the DIND images do we really need to be doing the installs every time? Why can't we just layer the code/variant bits on top and update the base image with OS dependencies once a week or so? @marun

@marun
Copy link
Contributor

marun commented Oct 21, 2016

@stevekuznetsov I don't think this is a connectivity issue. In addition to these ci failures, I've seen the same error building fedora images locally or via the docker hub. Something is up with the fedora repos.

@stevekuznetsov
Copy link
Contributor

I understand -- if we don't build them, we don't have the issue. What are we gaining by re-installing the dependencies in every build?

@marun
Copy link
Contributor

marun commented Oct 21, 2016

I think it's a good idea to build regularly to ensure we catch problems before they impact too many people on the networking team. But that doesn't have to be with every PR. I think a good strategy would be to bake the dind images into the ami and then rely on the extended and post-merge jobs to catch build-related regressions.

@marun
Copy link
Contributor

marun commented Oct 21, 2016

Related pr: #9622

@bparees
Copy link
Contributor

bparees commented Apr 28, 2017

not clear from @stevekuznetsov's comment above if what i just hit is this flake or not:

https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin_extended_networking_minimal/1529

Step 2 : RUN dnf -y update && dnf -y install bind-utils findutils hostname iproute iputils less procps-ng tar which bridge-utils ethtool iptables-services openvswitch && dnf clean all
 ---> Running in a3eed2d6d478
Error: Failed to synchronize cache for repo 'updates'
The command '/bin/sh -c dnf -y update && dnf -y install bind-utils findutils hostname iproute iputils less procps-ng tar which bridge-utils ethtool iptables-services openvswitch && dnf clean all' returned a non-zero code: 1
[ERROR] PID 16368: hack/dind-cluster.sh:276: `${DOCKER_CMD} build -t "${image_name}" .` exited with status 1.
[INFO] 		Stack Trace: 
[INFO] 		  1: hack/dind-cluster.sh:276: `${DOCKER_CMD} build -t "${image_name}" .`
[INFO] 		  2: hack/dind-cluster.sh:267: build-image
[INFO] 		  3: hack/dind-cluster.sh:372: build-images
[INFO]   Exiting with code 1.
[ERROR] PID 906: test/extended/networking.sh:350: `${CLUSTER_CMD} build-images` exited with status 1.
[INFO] 		Stack Trace: 
[INFO] 		  1: test/extended/networking.sh:350: `${CLUSTER_CMD} build-images`
[INFO]   Exiting with code 1.
/data/src/github.com/openshift/origin/hack/lib/log/system.sh: line 31: 16363 Terminated              sar -A -o "${binary_logfile}" 1 86400 > /dev/null 2> "${stderr_logfile}"
[ERROR] PID 853: test/extended/networking-minimal.sh:6: `NETWORKING_E2E_MINIMAL=1 "${OS_ROOT}/test/extended/networking.sh"` exited with status 1.
[INFO] 		Stack Trace: 
[INFO] 		  1: test/extended/networking-minimal.sh:6: `NETWORKING_E2E_MINIMAL=1 "${OS_ROOT}/test/extended/networking.sh"`
[INFO]   Exiting with code 1.
make: *** [test-extended] Error 1
++ export status=FAILURE
++ status=FAILURE
+ set +o xtrace
########## FINISHED STAGE: FAILURE: RUN EXTENDED TESTS ##########

@bparees
Copy link
Contributor

bparees commented Apr 28, 2017

but i think it's this one. it's not a specific package not found issue, just a general dnf connection issue.

@stevekuznetsov
Copy link
Contributor

The previous error was 06:44:44 Error: No packages marked for removal. which is why I pointed him at a different issue. The logs you posted show the dnf issue that is really being tracked here.

@deads2k
Copy link
Contributor

deads2k commented Apr 28, 2017

@stevekuznetsov
Copy link
Contributor

Sure looks like it. Must have been a roll-out of a new version to @updates recently. Until we serve RPMs from our mirrors and our mirrors only (@gmontero) inside and outside of container builds, we'll continue seeing this forever. Or we could try to change the yum backend to be more graceful here.

@danwinship
Copy link
Contributor

It's interesting that the failure seems to always happen when building the second image; openshift/dind builds successfully (including doing a "dnf update") but then openshift/dind-node fails. Maybe if we drop the "dnf clean all" from the openshift/dind Dockerfile then this bug will magically go away?

@stevekuznetsov
Copy link
Contributor

I don't know enough about the environment to say for certain but I would be surprised if the caches or other dnf data were actually interacting between the two builds. @smarterclayton would you expect that sort of cross-pollination to be possible?

@danwinship
Copy link
Contributor

openshift/dind-node is built "FROM openshift/dind", so its "RUN dnf -y update" is running against whatever state the openshift/dind build left the dnf caches in. Obviously this shouldn't actually be a problem, but if there was some bug in dnf's regenerate-caches-from-scratch code, it wouldn't get seen much in normal operation (since people don't normally "dnf clean all") so that might explain why we see this problem all the time but ordinary fedora users don't

@stevekuznetsov
Copy link
Contributor

Ah, I see what you mean. Was it there just to reduce the size of the image? Seems reasonable to remove it/

@soltysh
Copy link
Contributor

soltysh commented Apr 28, 2017

There's no cache in place, what we usually do in all our images is, we clean the cache after installation, see here.

@soltysh
Copy link
Contributor

soltysh commented Apr 28, 2017

OK, nvmd, I just noticed the PR removing that 🤦‍♂️

@bparees
Copy link
Contributor

bparees commented May 4, 2017

https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_origin/548/consoleFull#82220402358b6e51eb7608a5981914356

Error: Failed to synchronize cache for repo 'updates'
The command '/bin/sh -c dnf -y update && dnf -y install docker glibc-langpack-en iptables openssh-clients openssh-server' returned a non-zero code: 1
[ERROR] PID 16703: hack/dind-cluster.sh:276: `${DOCKER_CMD} build -t "${image_name}" .` exited with status 1.
[INFO] 		Stack Trace: 
[INFO] 		  1: hack/dind-cluster.sh:276: `${DOCKER_CMD} build -t "${image_name}" .`
[INFO] 		  2: hack/dind-cluster.sh:266: build-image
[INFO] 		  3: hack/dind-cluster.sh:372: build-images
[INFO]   Exiting with code 1.
[ERROR] PID 1183: test/extended/networking.sh:350: `${CLUSTER_CMD} build-images` exited with status 1.
[INFO] 		Stack Trace: 
[INFO] 		  1: test/extended/networking.sh:350: `${CLUSTER_CMD} build-images`
[INFO]   Exiting with code 1.
/data/src/github.com/openshift/origin/hack/lib/log/system.sh: line 31: 16698 Terminated              sar -A -o "${binary_logfile}" 1 86400 > /dev/null 2> "${stderr_logfile}"
[ERROR] PID 1130: test/extended/networking-minimal.sh:6: `NETWORKING_E2E_MINIMAL=1 "${OS_ROOT}/test/extended/networking.sh"` exited with status 1.
[INFO] 		Stack Trace: 
[INFO] 		  1: test/extended/networking-minimal.sh:6: `NETWORKING_E2E_MINIMAL=1 "${OS_ROOT}/test/extended/networking.sh"`
[INFO]   Exiting with code 1.
make: *** [test-extended] Error 1

@levysantanna
Copy link

levysantanna commented May 17, 2017

Same issue here:

Error: Failed to synchronize cache for repo 'fedora'
error: build error: The command '/bin/sh -c dnf update -y --releasever=25' returned a non-zero code: 1

Strangely it works on my Desktop's Docker:

Step 5 : RUN dnf update -y --releasever=25
 ---> Running in 1c14b7622cac
Last metadata expiration check: 0:00:46 ago on Wed May 17 14:42:19 2017.
Dependencies resolved.
================================================================================
 Package                       Arch     Version                 Repository
                                                                           Size
================================================================================
Upgrading:
 audit-libs                    x86_64   2.7.6-1.fc25            updates   107 k
 ca-certificates               noarch   2017.2.14-1.0.fc25      updates   477 k
 coreutils                     x86_64   8.25-17.fc25            updates   1.1 M
 coreutils-common              x86_64   8.25-17.fc25            updates   1.9 M

@stevekuznetsov
Copy link
Contributor

@levysantanna this is a transient failure, no reason to expect you'd be able to reproduce it

@zopyx
Copy link

zopyx commented Jul 6, 2017

This is not a transient failure. I have this error today on one dev machine with one Docker image but not a different machine with the same Docker image...this behavior appears weird...

@danwinship
Copy link
Contributor

The error appears to happen when there's some specific sort of problem on one of the fedora mirrors, which will then cause every "yum update" that hits that mirror to fail until eventually the mirror resyncs with the masters and fixes things. (If you look through the past instances of the flake, it tends to happen in bursts; it will happen 5 or 10 times in one day, and then not at all for a few weeks or months.)

If you reliably see it on one machine and not on another (at a given time), it's just because of DNS caching; one of them has resolved "mirrors.fedoraproject.org" to the mirror that has the problem, and the other has resolved it to a different mirror. If you can actually figure out which mirror is having the problems then filing a bug against the fedora infrastructure and/or emailing the maintainer of that mirror might help them figure out exactly what causes this...

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 13, 2018
@stevekuznetsov
Copy link
Contributor

We've pruned out dependencies on @updaets and @epel so this should be fixed.

/close

@nicklasring
Copy link

Still having this issue, works fine on my desktop, "Error: Failed to synchronize cache for repo 'updates'" on the server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/tests kind/test-flake Categorizes issue or PR as related to test flakes. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. priority/P2
Projects
None yet
Development

No branches or pull requests