Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config_wrapper: Add support for named stores #7946

Merged
merged 17 commits into from
Jan 10, 2023

Conversation

ashwanthgoli
Copy link
Contributor

Signed-off-by: Ashwanth Goli iamashwanth@gmail.com

What this PR does / why we need it:
This PR adds support for named stores which allows users to define multiple object store configurations for a storage type. Named stores can be referred to from other sections of the config similar to how provider type is being used i.e aws, gcs.... Named store reference should be of the format: <store_type>.<key_name>

Below is an example configuration with two named stores defined for aws store-1, store-2. Schema config is referring to one of them as aws.store-1

storage_config:
  aws:
    endpoint: s3://common-bucket
    region: us-east1
    access_key_id: abc123
    secret_access_key: def789
  named_stores:
    aws:
      store-1:
        endpoint: s3://foo-bucket
        region: us-west1
        access_key_id: 123abc
        secret_access_key: 789def
      store-2:
        endpoint: s3://bar-bucket
        region: us-west2
        access_key_id: 456def
        secret_access_key: 789abc
schema_config:
    configs:
        - from: "2020-07-31"
            index:
              period: 24h
              prefix: loki_prod_index_
            object_store: aws.store-1
            schema: v11
            store: boltdb-shipper

Which issue(s) this PR fixes:
Fixes #7276

Special notes for your reviewer:
Configuration defined in common storage config has no effect on the named stores. Common storage config only overrides storageConfig if it's not explicitly set - same behavior as before. This should be a non-intrusive change.

Checklist

  • Reviewed the CONTRIBUTING.md guide
  • Documentation added
  • Tests updated
  • CHANGELOG.md updated
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/upgrading/_index.md

Signed-off-by: Ashwanth Goli <iamashwanth@gmail.com>
@github-actions github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Dec 23, 2022
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
-            storage	-1.9%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@ashwanthgoli ashwanthgoli marked this pull request as ready for review December 23, 2022 08:35
@ashwanthgoli ashwanthgoli requested a review from a team as a code owner December 23, 2022 08:35
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
-            storage	-1.9%
+           chunkenc	0%
+              logql	0%
+               loki	0%

Copy link
Contributor

@ssncferreira ssncferreira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 and great job updating the _index.md file with the storage configuration flags 🏆

pkg/storage/factory.go Outdated Show resolved Hide resolved
pkg/storage/factory.go Outdated Show resolved Hide resolved
pkg/storage/factory.go Outdated Show resolved Hide resolved
pkg/storage/factory.go Outdated Show resolved Hide resolved
@@ -217,8 +222,20 @@ var (
},
{
Name: "local_storage_config",
StructType: reflect.TypeOf(local.Config{}),
StructType: reflect.TypeOf(local.FSConfig{}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch ⭐

docs/sources/configuration/_index.md Outdated Show resolved Hide resolved
Copy link
Contributor

@ssncferreira ssncferreira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🌟 thanks for addressing the comments. Just adding some minor comments about the unit tests 🙂

secret_access_key: 789abc`
config, _ := testContext(namedStoresConfig, nil)

// should be set by common config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, since we defined storage in 1) common config, and in 2) named storage, the common config will be ignored and the named storage will be the one used, right? Any way we can assert that in the test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

common config won't be ignored as it overrides StorageConfig (if it's not configured). And users can choose to use one or both by setting objectType across periodConfigs.
aws for using configuration from StorageConfig (which actually comes from common config)
aws.named-store-key-name for using the named store

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got it 👍 not necessary for this PR, but maybe it would be useful to add this information to period_config, that one can set it via the StorageConfig or via named_stored

pkg/storage/factory_test.go Outdated Show resolved Hide resolved
pkg/storage/factory_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@ssncferreira ssncferreira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🏆 the check-doc CI step is failing, so you need to re-run make doc to update the _index.md file

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
-            storage	-1.9%
+           chunkenc	0%
+              logql	0%
+               loki	0%

Copy link
Contributor

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I think the tests are insufficient though

Comment on lines 239 to 242
// extact storeType for named stores
if words := strings.SplitN(name, ".", 2); len(words) == 2 {
storeType = words[0]
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be its own function with a test associated

tools/doc-generator/parse/parser.go Show resolved Hide resolved
pkg/storage/factory_test.go Show resolved Hide resolved
pkg/storage/factory_test.go Show resolved Hide resolved
)

// extact storeType for named stores
if words := strings.SplitN(name, ".", 2); len(words) == 2 {
Copy link
Member

@owen-d owen-d Jan 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than requiring we use <type>.<name> as a convention in the configuration, what if we allowed specifying only the store's name and resolved the type from that? As a bonus, this would also work with stores that used the reserved . characters, such as name: "my.s3.compatible.store". I think this would be easier to use and understand (I had to check test cases to realize the convention of putting the . in the store name to look it up). Note: if we made this change, all the individual types would be reserved store names (for example, you couldn't redefine another store with name aws).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also a bit worried that this convention might not be the most intuitive.
Just to make sure I understand it correctly: we want to drop the type prefix when referring to the named store, but the way we are configuring names stores can remain the same right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make it so we can refer to a single store by the name alone and the type could be inferred. Validation would ensure that there are no duplicates across different types and there are no custom stores with legacy provider types (i.e. aws).

Hopefully this answers your question.

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@ashwanthgoli ashwanthgoli requested a review from owen-d January 5, 2023 11:26
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0%
+               iter	0%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
+ querier/queryrange	0.1%
+               iter	0%
-            storage	-1.4%
+           chunkenc	0%
+              logql	0%
+               loki	0%

pkg/storage/factory.go Outdated Show resolved Hide resolved
return nil
}

func (ns NamedStores) lookupStoreType(name string) (string, bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit weird to me; why are we looking it up in the YAML rather than capturing this type when we instantiate the named store?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NewObjectClient or NewChunkClient only accept the named store reference key or one of the predefined stores name (aws, gcs...) but with this name alone we do not know which map to access among the fields of NamedStores.

so we'd have to lookup each of these maps. Instead using a bit of meta programming to reduce the amount of boilerplate code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I think it'd be far simpler to just keep a mapping in the NamedStores struct of identifier to types than doing all this fancy footwork.

I don't want to hold up this PR further, so we can leave it as-is, but in general I'd suggest erring on the side of simplicity and readability over conciseness; it'll work out better in the long run (IMHO).

}

func (ns NamedStores) validate() error {
for name, awsCfg := range ns.AWS {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Howcome we're only validating these 3, out of interest?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only 3 out of the supported named stores have a validation method

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, just a suggestion for the future:
It'd be cleaner to iterate over each config and see if it satisfies

interface {
  Validate() error
}

than special-casing these 3 stores. It also has the benefit of expressing the intent more clearly (as opposed to this code which seems to only validate these 3 for some unknown reason).

@@ -50,23 +54,65 @@ func TestFactoryStop(t *testing.T) {
store.Stop()
}

func TestCassandraInMultipleSchemas(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious why Cassandra is being tested specifically here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure, I didn't touch this
must have moved it when adding new tests


store.Stop()
// FSObjectClient creates the configured dir on init, ensure that correct cfg is picked by checking for it.
_, err = os.Stat(cfg.NamedStores.Filesystem["named-store"].Directory)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should first check that this directory does not exist before instantiating the store, to account for false-positives.


// dir specified in StorageConfig/FSConfig should not be created as we are not referring to it.
_, err = os.Stat(cfg.FSConfig.Directory)
require.Error(t, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems redundant

Suggested change
require.Error(t, err)

@@ -65,6 +71,98 @@ type NamedStores struct {
Swift map[string]openstack.SwiftConfig `yaml:"swift"`
}

func (ns NamedStores) checkForDuplicates() error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain the rationale for approaching validation in this way?
Wouldn't it be easier to just retrieve the config and scan through all period configs and find duplicates across store types? Or are you doing it this way to spot duplicates within the same type?

This seems quite complicated, and I'm curious what problems you ran into here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finding duplicates would require checking the keys from each of the maps defined under NamedStores. Doing it this way to avoid the amount of redundant code needed to lookup each field.
I could drop this for the simpler approach too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simpler is usually better; it's less prone to bugs.
No need to change it now, just a suggestion for the future.

config.StorageTypeGCP, config.StorageTypeGCPColumnKey, config.StorageTypeBigTable, config.StorageTypeBigTableHashed, config.StorageTypeGCS,
config.StorageTypeAzure, config.StorageTypeBOS, config.StorageTypeSwift, config.StorageTypeCassandra,
config.StorageTypeFileSystem, config.StorageTypeInMemory, config.StorageTypeGrpc:
return fmt.Errorf("named store %s should not match with the name of a predefined storage type", name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i'd suggest changing these from %s to %q so that error messages will be a little more clear:

named store aws should not match...
vs
named store "aws" should not match...

newSchemaCfg.From = config.DayTime{Time: model.Now()}
storeType, ok := ns.lookupStoreType("store-1")
assert.True(t, ok)
assert.Equal(t, "gcs", storeType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use constants for these store type values

Copy link
Contributor

@dannykopping dannykopping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for being patient with the review feedback and simplifying the approach!

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0%
+        distributor	0%
+            querier	0%
- querier/queryrange	-0.1%
+               iter	0%
-            storage	-1.7%
+           chunkenc	0%
+              logql	0%
+               loki	0%

Copy link
Contributor

@sandeepsukhani sandeepsukhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XL type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow for multiple store configurations from the same provider
6 participants