Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update cache to use tamper proofing #1847

Merged
merged 26 commits into from
Feb 27, 2023
Merged

Update cache to use tamper proofing #1847

merged 26 commits into from
Feb 27, 2023

Conversation

pkwarren
Copy link
Member

@pkwarren pkwarren commented Feb 17, 2023

Update the buf CLI's cache to take advantage of tamper proofing features (manifest, blobs, and digests) to store files as content addressable storage. When enabled, the new cache directory is stored as:

  ~/.cache/buf/v2/module/{remote}/{owner}/{repo}
      blobs/
        {digest[:2]}/
          {digest[2:]} => The contents of the module's blob/manifest.
      commits/
        {commit} => The manifest digest

Update the buf CLI's cache to take advantage of tamper proofing features
(manifest, blobs, and digests) to store files as content addressable
storage. When enabled, the new cache directory is stored as:

  ~/.cache/buf/v1/module
    cas/
      blobs/
        {digest[:2]}/
          {digest[2:]
      commits/
        {commit} => The manifest hex digest
      manifests/
        {digest[:2]}/
          {digest[2:]
@unmultimedio
Copy link
Member

unmultimedio commented Feb 17, 2023

Great! I'll take a loot at this work. Some questions:

  ~/.cache/buf/v1/module                      // what's current module cache dir structure?
    cas/
      blobs/
        {digest[:2]}/
          {digest[2:]}
      commits/
        {commit} => The manifest hex digest   // a file with just that content, right?
      manifests/
        {digest[:2]}/
          {digest[2:]}                        // does these manifest files live also in the blobs dir? or just here?

.golangci.yml Outdated Show resolved Hide resolved

import "sync"

type cacheStats struct {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broke this out into a separate struct so it could be re-used across implementations.

"go.uber.org/zap"
)

func warnIfDeprecated(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broke this out into a utility method that could be called by both implementations.

private/pkg/manifest/manifest.go Show resolved Hide resolved
Copy link
Member

@bufdev bufdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some preliminary comments - some may be off-base, this is just a quick pass and I have less context, but wanted to get it to you ASAP

private/buf/bufcli/bufcli.go Outdated Show resolved Hide resolved
private/buf/bufcli/bufcli.go Show resolved Hide resolved
private/buf/bufcli/bufcli.go Outdated Show resolved Hide resolved
@@ -110,6 +111,10 @@ type Module interface {
//
// This may be nil, since older versions of the module would not have this stored.
LintConfig() *buflintconfig.Config
// Manifest returns the manifest for the module (possibly empty).
Manifest() manifest.Manifest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks the module abstraction (interface) - a module should not have to expose its underlying data in this way, the only access should be through GetModuleFile - if we do this, it needs to be strongly justified

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manifest wraps the entire module - it isn't a 1-1 with a file within the module. This was the simplest way I could find to provide additional data at the time of caching to the cache layer. I can look again but I think some other interfaces (like bucket / path abstractions) make it harder to return a more specialized type here.

@@ -25,13 +25,25 @@ import (
"go.uber.org/zap"
)

type moduleCache interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this for? If we need to keep it, it should not be in this file

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was to ensure that the cas_module_cacher and module_cacher didn't diverge. When tamper proofing is enabled, we can remove the module_cacher_* entirely and remove this interface.

private/pkg/manifest/storage.go Outdated Show resolved Hide resolved
.golangci.yml Outdated Show resolved Hide resolved
Comment on lines 114 to 117
// Manifest returns the manifest for the module (possibly empty).
Manifest() manifest.Manifest
// Blobs return the raw data for the module (possibly empty).
Blobs() manifest.BlobSet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we'll need to have some backend PR as well to cover this change, ready to be merged when this one is?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we rely on having these available in the backend for anything? We're only using them in the caching logic right now.

private/bufpkg/bufmodule/bufmodule.go Outdated Show resolved Hide resolved
.golangci.yml Outdated Show resolved Hide resolved
private/bufpkg/bufmodule/module.go Outdated Show resolved Hide resolved
@pkwarren pkwarren requested a review from bufdev February 24, 2023 22:53
Copy link
Contributor

@twilly twilly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 :)

@@ -549,19 +556,22 @@ func newModuleReaderAndCreateCacheDirs(
cacheModuleDataDirPath := normalpath.Join(container.CacheDirPath(), v1CacheModuleDataRelDirPath)
cacheModuleLockDirPath := normalpath.Join(container.CacheDirPath(), v1CacheModuleLockRelDirPath)
cacheModuleSumDirPath := normalpath.Join(container.CacheDirPath(), v1CacheModuleSumRelDirPath)
cacheModuleDirPathV2 := normalpath.Join(container.CacheDirPath(), v2CacheModuleRelDirPath)
if err := checkExistingCacheDirs(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question - in theory, should a new version of buf need to create the v1 cache dirs at all? Will it once tamper proofing is everywhere?

Copy link
Member Author

@pkwarren pkwarren Feb 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope - we'd only need the v2 directory. I can update to create only the necessary directories.

@@ -110,6 +111,10 @@ type Module interface {
//
// This may be nil, since older versions of the module would not have this stored.
LintConfig() *buflintconfig.Config
// Manifest returns the manifest for the module (possibly nil).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extend this comment to say what the manifest is for (all files including license, buf.md, etc)? Same for BlobSet, what is in the BlobSet, since BlobSet is a generic concept

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some more documentation here.

Copy link
Member

@unmultimedio unmultimedio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great. No blocking comments, a few nits, if there are any upcoming PR those could be addressed there if you want.

Comment on lines +199 to +210
blob, err := c.readBlob(ctx, moduleBasedir, manifestDigest)
if err != nil {
return nil, err
}
f, err := blob.Open(ctx)
if err != nil {
return nil, err
}
defer func() {
retErr = multierr.Append(retErr, f.Close())
}()
moduleManifest, err := manifest.NewFromReader(f)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels weird to be reading the blob twice here. It's an interface issue I guess. Food for thought.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We ended up doing it this way because we first validate the blob's contents against the digest before attempting to consume it as a manifest. I don't think we are really reading it twice here since the second open just returns an in-memory view of the blob's contents.

Comment on lines 98 to 102
moduleManifest := module.Manifest()
manifestBlob, err := moduleManifest.Blob()
if err != nil {
return err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You documented that .Manifest could be nil, you could check that here before asking its .Blob to avoid nil panic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check it a layer higher in the cas_module_reader, but I added an extra check here.

}
manifestDigest := manifestBlob.Digest().String()
if manifestDigest != modulePinDigest {
// buf.lock module digest and BSR module don't match - fail without overwriting cache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit

Suggested change
// buf.lock module digest and BSR module don't match - fail without overwriting cache
// buf.lock module digest and cached BSR module don't match - fail without overwriting cache

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't the cached module - it is the one we've just retrieved from the BSR due to a cache miss.

private/pkg/storage/storageos/bucket.go Outdated Show resolved Hide resolved
Copy link
Contributor

@twilly twilly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@pkwarren pkwarren merged commit e15702f into main Feb 27, 2023
@pkwarren pkwarren deleted the pkw/TCN-539-manifest-cache branch February 27, 2023 22:15
twoism pushed a commit that referenced this pull request Mar 2, 2023
Update the buf CLI's cache to take advantage of tamper proofing features
(manifest, blobs, and digests) to store files as content addressable
storage. When enabled, the new cache directory is stored as:

```
  ~/.cache/buf/v2/module/{remote}/{owner}/{repo}
      blobs/
        {digest[:2]}/
          {digest[2:]} => The contents of the module's blob/manifest.
      commits/
        {commit} => The manifest digest
```
Monirul1 pushed a commit to Monirul1/buf that referenced this pull request Apr 30, 2023
Update the buf CLI's cache to take advantage of tamper proofing features
(manifest, blobs, and digests) to store files as content addressable
storage. When enabled, the new cache directory is stored as:

```
  ~/.cache/buf/v2/module/{remote}/{owner}/{repo}
      blobs/
        {digest[:2]}/
          {digest[2:]} => The contents of the module's blob/manifest.
      commits/
        {commit} => The manifest digest
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants