-
Notifications
You must be signed in to change notification settings - Fork 491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Populate and prune entry event table #4411
Conversation
@amartinezfayo Can we start a review on this with priority? |
I think we're still waiting on an experimental flag that encapsulates the whole feature. We can't assume the tables used by the queries will exist until one minor release after the minor release in which the table migrations landed. |
We talked about this again today and realized that we probably shouldn't have an experimental flag for this until the whole feature is in place. Until then it probably belongs under a developer feature flag. We shouldn't wire up any of the new components (including populating the events table), unless that flag is on. Once the whole feature is in place, we can remove the developer feature flag in favor of an experimental flag that folks can use to exercise the new cache. And eventually once we have confidence, we can make this the default behavior. |
|
||
type ListRegistrationEntriesEventsResponse struct { | ||
EntryIDs []string | ||
FirstEventID uint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What this FirstEventID is?
is it id of the first elment in the response? how is it passed (or affected) on pagination?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes first id of element in responses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im not sure about pagination. let me test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need pagination for this api?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd put it in. Bulk deletion / addition APIs could (even though the data per event is small) overflow the single page response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to create unit tests for the pagination but it seems to overlap some with the specific greater than event id filter for this api. This is similar to the Token half of the pagination, but with an unlimited page size. Thats what we need for this functionality, all events that happened since the last time we polled the api.
Given the current implementation fetches all entries with no pagination, im not sure pagination is needed here. @edwbuck what do you think of removing the pagination? This would simplify the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have an issue with non-pagination. I just wanted to mention the bulk updates.
In my mind, if pagination becomes and issue, we can address it then. Honestly, the events have so few data items in them (just references to other items) that I imagine one would need a lot of them to overflow the response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont have strong opinion with pagination.
We can get into s bulk insert with several entries, or a server that is not consuming updates for some time, in that case we'll be returning several IDs here and fetching all of them together, maybe we can ""paginate" the response in the caller? just to prevent protobuf limits? (in case someone has a lot of entries to update)
But I dont have issues about not supporting pagination. @azdagron do you have something against removing pagination here?
@@ -3693,6 +3693,52 @@ func (s *PluginSuite) TestDeleteBundleDissociateRegistrationEntries() { | |||
s.Require().Empty(entry.FederatesWith) | |||
} | |||
|
|||
func (s *PluginSuite) TestListRegistrationEntriesEvents() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like tehre is no unit tests for pruning, I think we must validate that path too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added test for pruning
}) | ||
resp, err := s.ds.ListRegistrationEntriesEvents(ctx, &datastore.ListRegistrationEntriesEventsRequest{}) | ||
s.Require().NoError(err) | ||
s.Require().Equal(1, len(resp.EntryIDs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may we compare the expected IDs? tehe same comment is for all equal compartions
Using a list of expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comparisons
@@ -213,6 +217,7 @@ func TestListenAndServe(t *testing.T) { | |||
Metrics: metrics, | |||
RateLimit: rateLimit, | |||
EntryFetcherCacheRebuildTask: ef.RunRebuildCacheTask, | |||
EntryFetcherPruneEventsTask: ef.PruneEventsTask, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we are adding default at this level, I think we must validate that value using a test
pkg/server/endpoints/entryfetcher.go
Outdated
case <-ctx.Done(): | ||
a.log.Debug("Stopping event pruner") | ||
return nil | ||
case <-a.clk.After(a.eventsPruneInterval): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so, every 12hs it will try to remove all entries that are older than 12hs,
if we have e1
, e2
, e3
and e1
was created 13hs ago, and e2
11hs ago,
if we run prune, e1
will be removed now, but e2
will be removed in another 12hs (it will have 23hs)
I'm not sure if we must have 2 intervals,
first to know how ofter we try to remove (it can be every 1m or 10m or 1h) and
another one to know the older entries we support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been a while, but I believe the cache_reload_interval is now used to see how often to check the cache, and the eventsPruneInterval is the time from "now" into the past that sets the pruning limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do that way and prune every time we reload the cache. Though there would be more delete operations.
Can also have the pruner run at a faster rate than it prunes. So runs every 6 hours and prunes everything over 12 hours old.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the name of the configurable to PruneEventsOlderThan
and have the pruner task trigger at half of that.
entries := make(map[spiffeid.ID][]*types.Entry) | ||
buildCache := func(context.Context) (entrycache.Cache, error) { | ||
return newStaticEntryCache(entries), nil | ||
} | ||
|
||
ef, err := NewAuthorizedEntryFetcherWithFullCache(ctx, buildCache, log, clk, defaultCacheReloadInterval) | ||
pruneEventsFn := func(context.Context, time.Duration) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any of this functions are actualy using prune,
I think we'll need to add unit tests to validate prune too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test added
@@ -645,6 +646,14 @@ func NewServerConfig(c *Config, logOptions []log.Option, allowUnknownConfig bool | |||
sc.CacheReloadInterval = interval | |||
} | |||
|
|||
if c.Server.Experimental.PruneEventsOlderThan != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add unit tests for this? (if we are going to merge this config)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
|
||
type ListRegistrationEntriesEventsResponse struct { | ||
EntryIDs []string | ||
FirstEventID uint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont have strong opinion with pagination.
We can get into s bulk insert with several entries, or a server that is not consuming updates for some time, in that case we'll be returning several IDs here and fetching all of them together, maybe we can ""paginate" the response in the caller? just to prevent protobuf limits? (in case someone has a lot of entries to update)
But I dont have issues about not supporting pagination. @azdagron do you have something against removing pagination here?
resp, err := s.ds.ListRegistrationEntriesEvents(ctx, &datastore.ListRegistrationEntriesEventsRequest{}) | ||
s.Require().NoError(err) | ||
return reflect.DeepEqual(tt.expectedEntryIDs, resp.EntryIDs) | ||
}, 10*time.Second, 50*time.Millisecond, "Failed to prune entries correctly") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think about adding clock to ds layer (and a mock here), and use that instead of time.Now.
That will allow you to add test cases where you can advance time and have more robust test cases where you have more events and prune partially
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think thats a great idea. Is it ok to that in a follow on pr? I think its a little out of scope for this one.
select { | ||
case err := <-watchErr: | ||
assert.NoError(t, err) | ||
case <-time.After(5 * time.Second): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can relay on ctx done here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you requesting to put case <-ctx.Done():
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since context will finish (with timeout) you can rely ctx.Done()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case we are calling cancel()
just a few lines above on line 239. If we use ctx.Done()
that will always be true so we never check for the watchErr on line 241.
clk.Add(defaultPruneEventsOlderThan) | ||
select { | ||
case <-pruneEventsCh: | ||
case <-time.After(5 * time.Second): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can depends on ctx .Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed.
Signed-off-by: Faisal Memon <fymemon@yahoo.com>
Bumps [sigs.k8s.io/controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) from 0.15.0 to 0.15.1. - [Release notes](https://github.com/kubernetes-sigs/controller-runtime/releases) - [Changelog](https://github.com/kubernetes-sigs/controller-runtime/blob/main/RELEASE.md) - [Commits](kubernetes-sigs/controller-runtime@v0.15.0...v0.15.1) --- updated-dependencies: - dependency-name: sigs.k8s.io/controller-runtime dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>
Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.10.0 to 0.11.0. - [Commits](golang/sys@v0.10.0...v0.11.0) --- updated-dependencies: - dependency-name: golang.org/x/sys dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>
Signed-off-by: Zack Train <ztrain@uber.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.13.0 to 0.14.0. - [Commits](golang/net@v0.13.0...v0.14.0) --- updated-dependencies: - dependency-name: golang.org/x/net dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>
…ncy telemetry util (spiffe#4399) * Add telemetry instrumentation for delegated identity API and add latency telemetry util Signed-off-by: chiragk25 <chirag.d.kapadia@gmail.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>
…spiffe#4416) Bumps [github.com/aws/aws-sdk-go-v2/service/ec2](https://github.com/aws/aws-sdk-go-v2) from 1.109.1 to 1.110.1. - [Release notes](https://github.com/aws/aws-sdk-go-v2/releases) - [Commits](aws/aws-sdk-go-v2@service/ec2/v1.109.1...service/ec2/v1.110.1) --- updated-dependencies: - dependency-name: github.com/aws/aws-sdk-go-v2/service/ec2 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>
Bumps [actions/setup-go](https://github.com/actions/setup-go) from 4.0.1 to 4.1.0. - [Release notes](https://github.com/actions/setup-go/releases) - [Commits](actions/setup-go@fac708d...93397be) --- updated-dependencies: - dependency-name: actions/setup-go dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>
Bumps [google.golang.org/api](https://github.com/googleapis/google-api-go-client) from 0.134.0 to 0.136.0. - [Release notes](https://github.com/googleapis/google-api-go-client/releases) - [Changelog](https://github.com/googleapis/google-api-go-client/blob/main/CHANGES.md) - [Commits](googleapis/google-api-go-client@v0.134.0...v0.136.0) --- updated-dependencies: - dependency-name: google.golang.org/api dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>
Bumps [github.com/sigstore/sigstore](https://github.com/sigstore/sigstore) from 1.7.1 to 1.7.2. - [Release notes](https://github.com/sigstore/sigstore/releases) - [Commits](sigstore/sigstore@v1.7.1...v1.7.2) --- updated-dependencies: - dependency-name: github.com/sigstore/sigstore dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>
This project generates releases by just creating a new release branch without a corresponding semver tag, and changing the major version tag to point to the release branch, which isn't enough for dependabot to automatically detect the new versions, see msys2/setup-msys2#327 Manually update this step for now to the current commit pointed to by the `v2` tag (`v2.20.0`): https://github.com/msys2/setup-msys2/tree/v2 Signed-off-by: Ryan Turner <turner@uber.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>
Bumps [k8s.io/kube-aggregator](https://github.com/kubernetes/kube-aggregator) from 0.27.4 to 0.28.0. - [Commits](kubernetes/kube-aggregator@v0.27.4...v0.28.0) --- updated-dependencies: - dependency-name: k8s.io/kube-aggregator dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>
Quite some time ago we added a scan to first warn and then eventually delete entries with invalid SPIFFE IDs. This scan is no longer needed, since entries will have already been removed by previous upgrades and can be removed. Signed-off-by: Andrew Harding <azdagron@gmail.com> Co-authored-by: Marcos Yacob <marcos.yacob@hpe.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>
Signed-off-by: Faisal Memon <fymemon@yahoo.com>
Signed-off-by: Faisal Memon <fymemon@yahoo.com>
Signed-off-by: Faisal Memon <fymemon@yahoo.com>
Signed-off-by: Faisal Memon <fymemon@yahoo.com>
Signed-off-by: Faisal Memon <fymemon@yahoo.com>
Signed-off-by: Faisal Memon <fymemon@yahoo.com>
Signed-off-by: Faisal Memon <fymemon@yahoo.com>
Signed-off-by: Faisal Memon <fymemon@yahoo.com>
Signed-off-by: Faisal Memon <fymemon@yahoo.com>
Signed-off-by: Faisal Memon <fymemon@yahoo.com>
Signed-off-by: Faisal Memon <fymemon@yahoo.com> Co-authored-by: Marcos Yacob <marcos.yacob@hpe.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>
Signed-off-by: Faisal Memon <fymemon@yahoo.com>
Signed-off-by: Faisal Memon <fymemon@yahoo.com> Co-authored-by: Marcos Yacob <marcos.yacob@hpe.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>
Signed-off-by: Faisal Memon <fymemon@yahoo.com>
9ae1846
to
094a8ad
Compare
Signed-off-by: Faisal Memon <fymemon@yahoo.com>
DCO fixed and branch up to date with latest. |
func listRegistrationEntriesEvents(tx *gorm.DB, req *datastore.ListRegistrationEntriesEventsRequest) (*datastore.ListRegistrationEntriesEventsResponse, error) { | ||
var events []RegisteredEntryEvent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is experimental it must be called only when feature flag is enabled
func listRegistrationEntriesEvents(tx *gorm.DB, req *datastore.ListRegistrationEntriesEventsRequest) (*datastore.ListRegistrationEntriesEventsResponse, error) { | |
var events []RegisteredEntryEvent | |
func listRegistrationEntriesEvents(tx *gorm.DB, req *datastore.ListRegistrationEntriesEventsRequest) (*datastore.ListRegistrationEntriesEventsResponse, error) { | |
if !fflag.IsSet(fflag.FlagEventsBasedCache) { | |
return &datastore.ListRegistrationEntriesEventsRespons{}, nil | |
} | |
var events []RegisteredEntryEvent |
func pruneRegistrationEntriesEvents(tx *gorm.DB, olderThan time.Duration) error { | ||
if err := tx.Where("created_at < ?", time.Now().Add(-olderThan)).Delete(&RegisteredEntryEvent{}).Error; err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is experimental it must be called only when feature flag is enabled
func pruneRegistrationEntriesEvents(tx *gorm.DB, olderThan time.Duration) error { | |
if err := tx.Where("created_at < ?", time.Now().Add(-olderThan)).Delete(&RegisteredEntryEvent{}).Error; err != nil { | |
func pruneRegistrationEntriesEvents(tx *gorm.DB, olderThan time.Duration) error { | |
if !fflag.IsSet(fflag.FlagEventsBasedCache) { | |
return nil | |
} | |
if err := tx.Where("created_at < ?", time.Now().Add(-olderThan)).Delete(&RegisteredEntryEvent{}).Error; err != nil { |
Signed-off-by: Marcos Yacob <marcos.yacob@hpe.com>
Pull Request check list
Affected functionality
SPIRE datastore
Description of change
Second part of datastore enhancement:
The cache is not updated with this pr. Node events is not populated yet either. That will be in the next PR.
Which issue this PR fixes
Related to #2182