-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only rely on shasum for dependency cache hit #233
Only rely on shasum for dependency cache hit #233
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the premise here. The primary purpose is to cache the dependency that was downloaded. The shasum of the dependency should be sufficient to determine if there is a need to fetch the resource. That said, since this is a very critical piece of code, this review will take a bit longer. I need to do a lot of testing on this to be satisfied this isn't going to break something.
A couple of minor suggestions included for now.
dependency_cache.go
Outdated
@@ -131,29 +129,27 @@ func (d *DependencyCache) Artifact(dependency BuildpackDependency, mods ...Reque | |||
} | |||
|
|||
file = filepath.Join(d.CachePath, fmt.Sprintf("%s.toml", dependency.SHA256)) | |||
b, err := ioutil.ReadFile(file) | |||
_, err := ioutil.ReadFile(file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we are just checking if the file exists, so we can use https://github.com/paketo-buildpacks/libpak/blob/main/sherpa/exists.go
dependency_cache.go
Outdated
d.Logger.Bodyf("%s cached download from buildpack", color.GreenString("Reusing")) | ||
return os.Open(filepath.Join(d.CachePath, dependency.SHA256, filepath.Base(uri))) | ||
} | ||
|
||
file = filepath.Join(d.DownloadPath, fmt.Sprintf("%s.toml", dependency.SHA256)) | ||
b, err = ioutil.ReadFile(file) | ||
_, err = ioutil.ReadFile(file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above, https://github.com/paketo-buildpacks/libpak/blob/main/sherpa/exists.go.
e9a1b28
to
2d7fb31
Compare
Thanks. Much appreciated. If tests show that there is some reason for this (which I doubt), we could probably try to add some unit test to document this. |
Looking at this some more. There is definitely an issue with the deprecation date. I'm seeing this trigger a reload of the layer at times when it should not. It looks like the issue is when it's comparing the expected and actual metadata for the layer, the deprecation date has a timezone in one case and doesn't have a timezone in the other case (I have a suspicion this is because there are two different toml libraries being used and each is handling the date slight different). This causes the deep equals comparison to fail and the layer metadata doesn't match. I'd expect something similar can happen with the artifact downloading as it's doing something similar. My concern with this proposal is this. There is a slight difference in the behavior. If you were to make a metadata-only change to a dependency, the way it is previously implemented would detect that change and redownload the binary. This proposal would not. Since this has been the behavior of the library forever, it's really hard for us to tell if someone might be depending on the way this works in their buildpacks. I've been trying to think about cases where one might make a metadata-only change. The cases that come to mind are when something gets published with the wrong metadata, or when developing and you're making changes to the dependency metadata. At any rate, what I'd like to do is this:
|
But isn't that normal for a bugfix?
Yes, the very same binary with the exact same
What would be a use case?
Deprecation dates might change, right?
tbh, I don't see the point since this is rather a bug to me. It screwed up my offline buildpack and judging on the linked issue, I am not the only one coming across this strange behavior. But it is of course your call.
Sounds good, but independent to me.
But fiddling around with the metadata instead of fixing the root problem, sounds strange to be tbh. I would still like to understand why the metadata was compared in the first place. |
I have a hard time thinking about any way someone could depend on this. Is there any way to perceive the change at all? It's about getting a binary, isn't it? And, besides this change in the implementation, at the end you have a binary with a specific sha256. I cannot come up with any potential dependency on whether this binary was taken from the cache or downloaded. |
A |
Not created yet. What we'll probably do is branch for 1.x maintenance and make |
In a perfect world, I'd say yes. The implementation of how that happens shouldn't matter. If it's comparing all the metadata or just the shasum at the end of the day you should get the same binary. I'll admit I'm looking at this in a pessimistic manner. Abstractions can be leaky though, and this wouldn't be the first time where we've tried to change something only to find that the abstraction wasn't good enough and people were depending on the behavior of the implementation. Given the age of v1 and the fact that we're planning v2 where we can easily make this change without disrupting folks, I'd prefer to go that route and not take the risk of changing the implementation. In addition, you said: In regards to the cache misses, can you elaborate on when you're seeing them Is this happening just because of the deprecation date? Or other metadata? What buildpacks and dependencies does it impact? Are these cache misses reproducible? If so, what are some steps I could take to look more closely at it? Thanks |
Sure, solving the concrete case for v1 would help. I was just really curious and had (and still have) my doubts that the additional effort (to fix the root cause in v2 and additionally band aid in v1) is justified. But it's your call of course. |
That was not as straight forward as I expected due to the combination of using |
@c0d1ngm0nk3y Thank you so much for putting that together. Working with I 100% want to move to just comparing the sha hash, but in v2. In v1, I'm optimistic we can make this work with a bit less risk of changing behavior. |
I created #266 for this.
I had a look and I will create a dedicated pr for
Fixed with #240 |
see #266 |
fixes #167
Summary
When checking if some dependency can be reused, it should be enough to check the
shasum
. There is no need to verify that the metadata is the same (the binary is).Use Cases
This caused cache misses when the metadata was different (e.g.
DeprecationDate
).Checklist