-
Notifications
You must be signed in to change notification settings - Fork 286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update cache to use tamper proofing #1847
Conversation
Update the buf CLI's cache to take advantage of tamper proofing features (manifest, blobs, and digests) to store files as content addressable storage. When enabled, the new cache directory is stored as: ~/.cache/buf/v1/module cas/ blobs/ {digest[:2]}/ {digest[2:] commits/ {commit} => The manifest hex digest manifests/ {digest[:2]}/ {digest[2:]
Great! I'll take a loot at this work. Some questions:
|
|
||
import "sync" | ||
|
||
type cacheStats struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broke this out into a separate struct so it could be re-used across implementations.
"go.uber.org/zap" | ||
) | ||
|
||
func warnIfDeprecated( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broke this out into a utility method that could be called by both implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some preliminary comments - some may be off-base, this is just a quick pass and I have less context, but wanted to get it to you ASAP
@@ -110,6 +111,10 @@ type Module interface { | |||
// | |||
// This may be nil, since older versions of the module would not have this stored. | |||
LintConfig() *buflintconfig.Config | |||
// Manifest returns the manifest for the module (possibly empty). | |||
Manifest() manifest.Manifest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks the module abstraction (interface) - a module should not have to expose its underlying data in this way, the only access should be through GetModuleFile - if we do this, it needs to be strongly justified
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The manifest wraps the entire module - it isn't a 1-1 with a file within the module. This was the simplest way I could find to provide additional data at the time of caching to the cache layer. I can look again but I think some other interfaces (like bucket / path abstractions) make it harder to return a more specialized type here.
@@ -25,13 +25,25 @@ import ( | |||
"go.uber.org/zap" | |||
) | |||
|
|||
type moduleCache interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this for? If we need to keep it, it should not be in this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was to ensure that the cas_module_cacher
and module_cacher
didn't diverge. When tamper proofing is enabled, we can remove the module_cacher_*
entirely and remove this interface.
// Manifest returns the manifest for the module (possibly empty). | ||
Manifest() manifest.Manifest | ||
// Blobs return the raw data for the module (possibly empty). | ||
Blobs() manifest.BlobSet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we'll need to have some backend PR as well to cover this change, ready to be merged when this one is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we rely on having these available in the backend for anything? We're only using them in the caching logic right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 :)
private/buf/bufcli/bufcli.go
Outdated
@@ -549,19 +556,22 @@ func newModuleReaderAndCreateCacheDirs( | |||
cacheModuleDataDirPath := normalpath.Join(container.CacheDirPath(), v1CacheModuleDataRelDirPath) | |||
cacheModuleLockDirPath := normalpath.Join(container.CacheDirPath(), v1CacheModuleLockRelDirPath) | |||
cacheModuleSumDirPath := normalpath.Join(container.CacheDirPath(), v1CacheModuleSumRelDirPath) | |||
cacheModuleDirPathV2 := normalpath.Join(container.CacheDirPath(), v2CacheModuleRelDirPath) | |||
if err := checkExistingCacheDirs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question - in theory, should a new version of buf
need to create the v1 cache dirs at all? Will it once tamper proofing is everywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope - we'd only need the v2 directory. I can update to create only the necessary directories.
@@ -110,6 +111,10 @@ type Module interface { | |||
// | |||
// This may be nil, since older versions of the module would not have this stored. | |||
LintConfig() *buflintconfig.Config | |||
// Manifest returns the manifest for the module (possibly nil). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we extend this comment to say what the manifest is for (all files including license, buf.md, etc)? Same for BlobSet, what is in the BlobSet, since BlobSet is a generic concept
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some more documentation here.
private/bufpkg/bufmodule/bufmodulecache/repository_deprecation.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great. No blocking comments, a few nits, if there are any upcoming PR those could be addressed there if you want.
blob, err := c.readBlob(ctx, moduleBasedir, manifestDigest) | ||
if err != nil { | ||
return nil, err | ||
} | ||
f, err := blob.Open(ctx) | ||
if err != nil { | ||
return nil, err | ||
} | ||
defer func() { | ||
retErr = multierr.Append(retErr, f.Close()) | ||
}() | ||
moduleManifest, err := manifest.NewFromReader(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels weird to be reading the blob twice here. It's an interface issue I guess. Food for thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We ended up doing it this way because we first validate the blob's contents against the digest before attempting to consume it as a manifest. I don't think we are really reading it twice here since the second open just returns an in-memory view of the blob's contents.
moduleManifest := module.Manifest() | ||
manifestBlob, err := moduleManifest.Blob() | ||
if err != nil { | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: You documented that .Manifest
could be nil, you could check that here before asking its .Blob
to avoid nil panic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We check it a layer higher in the cas_module_reader, but I added an extra check here.
} | ||
manifestDigest := manifestBlob.Digest().String() | ||
if manifestDigest != modulePinDigest { | ||
// buf.lock module digest and BSR module don't match - fail without overwriting cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit
// buf.lock module digest and BSR module don't match - fail without overwriting cache | |
// buf.lock module digest and cached BSR module don't match - fail without overwriting cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't the cached module - it is the one we've just retrieved from the BSR due to a cache miss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Update the buf CLI's cache to take advantage of tamper proofing features (manifest, blobs, and digests) to store files as content addressable storage. When enabled, the new cache directory is stored as: ``` ~/.cache/buf/v2/module/{remote}/{owner}/{repo} blobs/ {digest[:2]}/ {digest[2:]} => The contents of the module's blob/manifest. commits/ {commit} => The manifest digest ```
Update the buf CLI's cache to take advantage of tamper proofing features (manifest, blobs, and digests) to store files as content addressable storage. When enabled, the new cache directory is stored as: ``` ~/.cache/buf/v2/module/{remote}/{owner}/{repo} blobs/ {digest[:2]}/ {digest[2:]} => The contents of the module's blob/manifest. commits/ {commit} => The manifest digest ```
Update the buf CLI's cache to take advantage of tamper proofing features (manifest, blobs, and digests) to store files as content addressable storage. When enabled, the new cache directory is stored as: