-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error: Failed to synchronize cache for repo 'updates' #11452
Comments
@stevekuznetsov fyi |
I've been seeing this failure regularly when I attempt to |
I've added failure cause 'dnf update failure' to jenkins. |
This is either an internet connectivity issue or a mirror issue... @tdawson are we mirroring |
I believe this is a Fedora only issue. We are not mirroring any plain Fedora repo's, only EPEL. |
Yes, this is true. As these types of issues proliferate, I think we need a better strategy for interacting with mirrors in general. Can we reduce the |
@stevekuznetsov I don't think this is a connectivity issue. In addition to these ci failures, I've seen the same error building fedora images locally or via the docker hub. Something is up with the fedora repos. |
I understand -- if we don't build them, we don't have the issue. What are we gaining by re-installing the dependencies in every build? |
I think it's a good idea to build regularly to ensure we catch problems before they impact too many people on the networking team. But that doesn't have to be with every PR. I think a good strategy would be to bake the dind images into the ami and then rely on the extended and post-merge jobs to catch build-related regressions. |
Related pr: #9622 |
not clear from @stevekuznetsov's comment above if what i just hit is this flake or not:
|
but i think it's this one. it's not a specific package not found issue, just a general dnf connection issue. |
The previous error was |
Sure looks like it. Must have been a roll-out of a new version to |
It's interesting that the failure seems to always happen when building the second image; openshift/dind builds successfully (including doing a "dnf update") but then openshift/dind-node fails. Maybe if we drop the "dnf clean all" from the openshift/dind Dockerfile then this bug will magically go away? |
I don't know enough about the environment to say for certain but I would be surprised if the caches or other |
openshift/dind-node is built "FROM openshift/dind", so its "RUN dnf -y update" is running against whatever state the openshift/dind build left the dnf caches in. Obviously this shouldn't actually be a problem, but if there was some bug in dnf's regenerate-caches-from-scratch code, it wouldn't get seen much in normal operation (since people don't normally "dnf clean all") so that might explain why we see this problem all the time but ordinary fedora users don't |
Ah, I see what you mean. Was it there just to reduce the size of the image? Seems reasonable to remove it/ |
There's no cache in place, what we usually do in all our images is, we clean the cache after installation, see here. |
OK, nvmd, I just noticed the PR removing that 🤦♂️ |
|
Same issue here:
Strangely it works on my Desktop's Docker:
|
@levysantanna this is a transient failure, no reason to expect you'd be able to reproduce it |
This is not a transient failure. I have this error today on one dev machine with one Docker image but not a different machine with the same Docker image...this behavior appears weird... |
The error appears to happen when there's some specific sort of problem on one of the fedora mirrors, which will then cause every "yum update" that hits that mirror to fail until eventually the mirror resyncs with the masters and fixes things. (If you look through the past instances of the flake, it tends to happen in bursts; it will happen 5 or 10 times in one day, and then not at all for a few weeks or months.) If you reliably see it on one machine and not on another (at a given time), it's just because of DNS caching; one of them has resolved "mirrors.fedoraproject.org" to the mirror that has the problem, and the other has resolved it to a different mirror. If you can actually figure out which mirror is having the problems then filing a bug against the fedora infrastructure and/or emailing the maintainer of that mirror might help them figure out exactly what causes this... |
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
We've pruned out dependencies on /close |
Still having this issue, works fine on my desktop, "Error: Failed to synchronize cache for repo 'updates'" on the server. |
Parent issue for discussion: #8571
====
Not sure if this is the same as the yum failures, but opening just in case it's different
https://ci.openshift.redhat.com/jenkins/job/test_pull_requests_origin_networking/277/
The text was updated successfully, but these errors were encountered: