-
Notifications
You must be signed in to change notification settings - Fork 894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checksum failed for nightly channel #524
Comments
Someone was also reporting this yesterday where it was a checksum failure in the manifests. Definitely seems fishy, as especially the toolchains should never have bad sha256 sums, right @brson? As this was happening (the failure), the nightly-dist-packaging-mac, nightly-dist-packaging-win-gnu-32, and nightly-dist-packaging-win-gnu-64 builders were all running. I don't think that should affect this, though, because the new TOML manifest wouldn't have been uploaded so none of the new artifacts should have been used. I also checked and we had no cloudfront invalidations in flight when this happened. |
Just a quick update: The expected checksum changed but it still does not match:
|
Ok I finally found the problem! I found a folder called |
Whoa! Sounds like we may not be invalidating a cache somewhere perhaps, definitely seems like a bug though. |
Someone was me, and the checksums in the report today are the same checksums I was seeing yesterday, which seems bad. |
Oh cool, I still have the files in
|
Right! If you remove the old tmp files does an update work for you @kamalmarhubi? |
@alexcrichton I believe right now it's possible to get checksum drift on the manifests. The self-update will just warn, but if the manifests don't agree with their .sha256 file it is still an error. |
I'm surprised @kamalmarhubi has manifests sitting directly in their /tmp folder. rustup should be using I wouldn't expect the contents of |
@brson should have mentioned: those were files I downloaded directly from the dist site while investigating the issue. Basically, I was verifying that the mismatch I saw on Thursday was not from weird local caching. Things work now, independently of those files being in |
Is this something that can be fixed? The drift I saw lasted for at least tens of minutes, and up to many hours depending on when OP saw those same checksums. This seems a bit long. I'm not sure what the serving infrastructure is behind cloudfront. Is it immediately backed by S3? Trying to figure out what's making this hard, and what could be done to change it. |
Yeah I'm curious how checksum drift is possible in the manifests. I thought it was only in the small window where an invalidation is in flight (or we're in the middle of an upload), but when this happened I confirmed and neither of those was happening. |
Let's switch both the self-update and manifest checksum checks to use the HTTP e-tags. The To facilitate the upgrade we'll need to maintain both code paths. We'll change the metadata format to store etags, like we currently do with update hashes. If there's an update hash (but no stored etag) dfor a particular artifact then use the old code path, otherwise use the etag path. The way we store etags will need to be slightly different than update hashes. Right now we don't store update hashes for the self-updates - we just calculate them from the running bin. So we'll need to store them in a format that supports etags for self updates or channels. Maybe it makes sense to make it a hash table of URLs to etags. I don't think this will require a metadata version bump. |
It happens here, but only on some computers:
While others are fine. EDIT: I guess it was a temporary server-side issue? Anyway, I didn't change much, but it works again. |
@gyscos It's a temporary issue with our servers. |
Also having this issue: |
Removing |
Just ran into this on FreeBSD as well:
|
I've got it on OS X as well. There's nothing cargo-y in |
Another +1 -
|
Works fine now, at least. |
Another temporary remediation we could make here: like with self-updates, when we see that the manifest checksum doesn't match, we can print a warning that the update is not yet available, try again later. It sucks, but at least it would be less alarming, and people would know what to do about it. |
@AtheMathmo it's only slightly more complicated than that. That error is emitted in several places and only one is causing this issue. [In |
I'm happy to make this change in a few hours. Thanks for the info! |
@brson - could you give a little more information about emitting the info diagnostic? Should I add a new value to the Notification enum - |
@AtheMathmo yes, that enum will need a new variant. I'd name it something more indicative of it's hacky nature, like |
@brson - looks good, thanks! :D |
@AtheMathmo made the quick fix. Thanks! Leaving this open for the full solution. |
@brson something fishy seems to be happening right now? On AppVeyor I'm getting this error currently, but on a build scheduled half an hour earlier the checksum error didn't happen. There's no way we can just chalk this all up to cloudfront, right? |
@alexcrichton yes, I agree that observations don't support the hypothesis that this is just due to drift in cloudfront invalidations. |
Am getting this:
|
Fixed itself |
In the discussion of signature validation I mentioned in passing my current preferred solution. Basically, we add another layer of indirection: a "current.toml" file that acts like a symlink. It contains two or three keys: one is the name of the current archive directory, the other is a checksum of the previous field, the possible third is a signature of that same field. We do this for both rustup and the rust manifests. I do plan to get around to this soon. I prefer this to attempting to wrangle cloudfront/s3 into doing something approximating symlinks because it can be done without depending on specifics of the rust distribution infrastructure. |
@brson Will that actually fix the problem though? What if "current.toml" is updated, but rustup can't access the files it points to because it doesn't see them yet? Maybe that can't happen, but unless we know exactly what's causing the current failures it seems a bit premature to switch to this method with no guarantee it will help. |
Well, I think it is likely to fix the problem. Even though some of the windows of time over which the mismatches persist seem to be longer than we'd expect given what we know about the CDN, it does seem that the problem is due to there being two files that are overwritten that must be in agreement. |
Temporary hack to avoid issues with consistent checksumming on the CDN. Issue rust-lang#524
This changes the update process in the following ways: the current version is read from the server at /rustup/stable-release.toml; if the version is different from the running version then rustup downloads the new release from the archives at /rustup/archive/$version/. Fixes rust-lang#524
This changes the update process in the following ways: the current version is read from the server at /rustup/stable-release.toml; if the version is different from the running version then rustup downloads the new release from the archives at /rustup/archive/$version/. Fixes rust-lang#524
I've been unable to update past the
|
Temporary hack to avoid issues with consistent checksumming on the CDN. Issue rust-lang#524
This changes the update process in the following ways: the current version is read from the server at /rustup/stable-release.toml; if the version is different from the running version then rustup downloads the new release from the archives at /rustup/archive/$version/. Fixes rust-lang#524
@ |
I just moved from multirust to rustup and experience the following problem when installing nightly:
Stable and beta install just fine. Also I removed all remains of multirust from my system(removed ~/.multirust, cleared $PATH, fresh console) and tried again, no help.
To clarify, I do not have any version of nightly currently installed (it seems to work for people with nightly already installed):
I am using the current version of rustup:
The text was updated successfully, but these errors were encountered: