-
Notifications
You must be signed in to change notification settings - Fork 141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use hashes for cache keys #780
Comments
Looks like sha256(compressed). |
Looking at APKINDEX.tar.gz, we have this entry:
The sha1 of the control segment:
Matches what's in APKINDEX:
So this will dovetail really well with #772 because in order to verify these hashes, we need to split the APK into 3 different gzipped segments anyway, so we might as well cache those segments separately with these hashes as their keys. Then we can look up by hash and avoid parsing stuff multiple times. |
Looking at the state of my current cache:
The I think my initial plan above re: directory structure still makes sense, so I'm going to go with that. |
Alright, got this working pretty well, I think. The This is after the same run I've been doing building
|
IIUC, we currently rely on Etags for both the APKINDEX and individual APKs when checking our cache.
Looking at a trace, you can see we are spending ~8ms on every cache hit just sending a HEAD request to get the etag:
This shouldn't be necessary because the APKINDEX is a happy little DAG that has all the information we need.
The
C:
field in APKINDEX contains the checksum of the APK control section.The
datahash
field in the control section's.PKGINFO
contains the sha256 of the data section.(TODO: Is this the compressed bytes or uncompressed bytes?)
(Source: https://wiki.alpinelinux.org/wiki/Apk_spec)
The bold edges represent content hashes. Note that we have a chain of them from the APKINDEX all the way to the APK's data section, so we can just use those hashes as keys in our cache to determine if we can reuse an APK.
My initial thought for directory structure would be something like:
Where under APKINDEX we have etag-based filenames, under each APK version we have the control section named after the checksum and the data section named after the datahash. We can just stat the files with the expected name (and re-hash them as we consume them to prevent tampering) to check existence rather than going all the way out to the internet.
(Splitting the APK like this will make cache validation much easier but also give us a performance benefit.)
Tasks
The text was updated successfully, but these errors were encountered: