Populate and prune entry event table #4411

faisal-memon · 2023-08-04T20:47:11Z

Pull Request check list

Commit conforms to CONTRIBUTING.md?
Proper tests/regressions included?
Documentation updated?

Affected functionality
SPIRE datastore

Description of change
Second part of datastore enhancement:

Creates an event every time an entry is created/updated/deleted.
Events are pruned every 12 hours. Pruning interval is configurable.
Creates 2 new datastore APIs to list and prune events
Listing events can be filtered to be greater than an event id

The cache is not updated with this pr. Node events is not populated yet either. That will be in the next PR.

Which issue this PR fixes
Related to #2182

edwbuck · 2023-08-09T16:07:51Z

@amartinezfayo Can we start a review on this with priority?

azdagron · 2023-08-09T17:03:16Z

I think we're still waiting on an experimental flag that encapsulates the whole feature. We can't assume the tables used by the queries will exist until one minor release after the minor release in which the table migrations landed.

cmd/spire-server/cli/run/run.go

azdagron · 2023-08-10T18:55:16Z

We talked about this again today and realized that we probably shouldn't have an experimental flag for this until the whole feature is in place. Until then it probably belongs under a developer feature flag. We shouldn't wire up any of the new components (including populating the events table), unless that flag is on. Once the whole feature is in place, we can remove the developer feature flag in favor of an experimental flag that folks can use to exercise the new cache. And eventually once we have confidence, we can make this the default behavior.

pkg/common/telemetry/server/datastore/event.go

pkg/server/datastore/datastore.go

MarcosDY · 2023-08-15T20:17:12Z

pkg/server/datastore/datastore.go

+
+type ListRegistrationEntriesEventsResponse struct {
+	EntryIDs     []string
+	FirstEventID uint


What this FirstEventID is?
is it id of the first elment in the response? how is it passed (or affected) on pagination?

yes first id of element in responses.

Im not sure about pagination. let me test

Do we need pagination for this api?

I'd put it in. Bulk deletion / addition APIs could (even though the data per event is small) overflow the single page response.

I tried to create unit tests for the pagination but it seems to overlap some with the specific greater than event id filter for this api. This is similar to the Token half of the pagination, but with an unlimited page size. Thats what we need for this functionality, all events that happened since the last time we polled the api.

Given the current implementation fetches all entries with no pagination, im not sure pagination is needed here. @edwbuck what do you think of removing the pagination? This would simplify the implementation.

I don't have an issue with non-pagination. I just wanted to mention the bulk updates.

In my mind, if pagination becomes and issue, we can address it then. Honestly, the events have so few data items in them (just references to other items) that I imagine one would need a lot of them to overflow the response.

I dont have strong opinion with pagination.
We can get into s bulk insert with several entries, or a server that is not consuming updates for some time, in that case we'll be returning several IDs here and fetching all of them together, maybe we can ""paginate" the response in the caller? just to prevent protobuf limits? (in case someone has a lot of entries to update)

But I dont have issues about not supporting pagination. @azdagron do you have something against removing pagination here?

pkg/server/datastore/sqlstore/sqlstore.go

MarcosDY · 2023-08-15T20:28:04Z

pkg/server/datastore/sqlstore/sqlstore_test.go

@@ -3693,6 +3693,52 @@ func (s *PluginSuite) TestDeleteBundleDissociateRegistrationEntries() {
 	s.Require().Empty(entry.FederatesWith)
 }

+func (s *PluginSuite) TestListRegistrationEntriesEvents() {


looks like tehre is no unit tests for pruning, I think we must validate that path too

added test for pruning

MarcosDY · 2023-08-15T20:29:40Z

pkg/server/datastore/sqlstore/sqlstore_test.go

+	})
+	resp, err := s.ds.ListRegistrationEntriesEvents(ctx, &datastore.ListRegistrationEntriesEventsRequest{})
+	s.Require().NoError(err)
+	s.Require().Equal(1, len(resp.EntryIDs))


may we compare the expected IDs? tehe same comment is for all equal compartions
Using a list of expected

Added comparisons

MarcosDY · 2023-08-15T20:33:15Z

pkg/server/endpoints/endpoints_test.go

@@ -213,6 +217,7 @@ func TestListenAndServe(t *testing.T) {
 		Metrics:                      metrics,
 		RateLimit:                    rateLimit,
 		EntryFetcherCacheRebuildTask: ef.RunRebuildCacheTask,
+		EntryFetcherPruneEventsTask:  ef.PruneEventsTask,


if we are adding default at this level, I think we must validate that value using a test

MarcosDY · 2023-08-15T20:49:19Z

pkg/server/endpoints/entryfetcher.go

+		case <-ctx.Done():
+			a.log.Debug("Stopping event pruner")
+			return nil
+		case <-a.clk.After(a.eventsPruneInterval):


so, every 12hs it will try to remove all entries that are older than 12hs,
if we have e1, e2, e3 and e1 was created 13hs ago, and e2 11hs ago,
if we run prune, e1 will be removed now, but e2 will be removed in another 12hs (it will have 23hs)

I'm not sure if we must have 2 intervals,
first to know how ofter we try to remove (it can be every 1m or 10m or 1h) and
another one to know the older entries we support.

It has been a while, but I believe the cache_reload_interval is now used to see how often to check the cache, and the eventsPruneInterval is the time from "now" into the past that sets the pruning limit.

We can do that way and prune every time we reload the cache. Though there would be more delete operations.

Can also have the pruner run at a faster rate than it prunes. So runs every 6 hours and prunes everything over 12 hours old.

Changed the name of the configurable to PruneEventsOlderThan and have the pruner task trigger at half of that.

MarcosDY · 2023-08-15T20:51:53Z

pkg/server/endpoints/entryfetcher_test.go

 	entries := make(map[spiffeid.ID][]*types.Entry)
 	buildCache := func(context.Context) (entrycache.Cache, error) {
 		return newStaticEntryCache(entries), nil
 	}

-	ef, err := NewAuthorizedEntryFetcherWithFullCache(ctx, buildCache, log, clk, defaultCacheReloadInterval)
+	pruneEventsFn := func(context.Context, time.Duration) error {


any of this functions are actualy using prune,
I think we'll need to add unit tests to validate prune too

pkg/server/datastore/sqlstore/sqlstore.go

faisal-memon · 2023-08-31T04:12:49Z

@azdagron @MarcosDY All comments have been addressed. Ready for review.

pkg/server/datastore/sqlstore/sqlstore.go

MarcosDY · 2023-09-04T16:39:45Z

cmd/spire-server/cli/run/run.go

@@ -645,6 +646,14 @@ func NewServerConfig(c *Config, logOptions []log.Option, allowUnknownConfig bool
 		sc.CacheReloadInterval = interval
 	}

+	if c.Server.Experimental.PruneEventsOlderThan != "" {


can you add unit tests for this? (if we are going to merge this config)

pkg/common/telemetry/server/datastore/event.go

MarcosDY · 2023-09-04T16:50:20Z

pkg/server/datastore/datastore.go

+
+type ListRegistrationEntriesEventsResponse struct {
+	EntryIDs     []string
+	FirstEventID uint


I dont have strong opinion with pagination.
We can get into s bulk insert with several entries, or a server that is not consuming updates for some time, in that case we'll be returning several IDs here and fetching all of them together, maybe we can ""paginate" the response in the caller? just to prevent protobuf limits? (in case someone has a lot of entries to update)

But I dont have issues about not supporting pagination. @azdagron do you have something against removing pagination here?

MarcosDY · 2023-09-04T17:31:16Z

pkg/server/datastore/sqlstore/sqlstore_test.go

+				resp, err := s.ds.ListRegistrationEntriesEvents(ctx, &datastore.ListRegistrationEntriesEventsRequest{})
+				s.Require().NoError(err)
+				return reflect.DeepEqual(tt.expectedEntryIDs, resp.EntryIDs)
+			}, 10*time.Second, 50*time.Millisecond, "Failed to prune entries correctly")


what do you think about adding clock to ds layer (and a mock here), and use that instead of time.Now.
That will allow you to add test cases where you can advance time and have more robust test cases where you have more events and prune partially

I think thats a great idea. Is it ok to that in a follow on pr? I think its a little out of scope for this one.

MarcosDY · 2023-09-04T17:40:14Z

pkg/server/endpoints/entryfetcher_test.go

+		select {
+		case err := <-watchErr:
+			assert.NoError(t, err)
+		case <-time.After(5 * time.Second):


you can relay on ctx done here

Are you requesting to put case <-ctx.Done():?

since context will finish (with timeout) you can rely ctx.Done()

In this case we are calling cancel() just a few lines above on line 239. If we use ctx.Done() that will always be true so we never check for the watchErr on line 241.

pkg/server/endpoints/entryfetcher_test.go

MarcosDY · 2023-09-04T17:45:16Z

pkg/server/endpoints/entryfetcher_test.go

+	clk.Add(defaultPruneEventsOlderThan)
+	select {
+	case <-pruneEventsCh:
+	case <-time.After(5 * time.Second):


you can depends on ctx .Done

Signed-off-by: Faisal Memon <fymemon@yahoo.com>

Bumps [sigs.k8s.io/controller-runtime](https://github.com/kubernetes-sigs/controller-runtime) from 0.15.0 to 0.15.1. - [Release notes](https://github.com/kubernetes-sigs/controller-runtime/releases) - [Changelog](https://github.com/kubernetes-sigs/controller-runtime/blob/main/RELEASE.md) - [Commits](kubernetes-sigs/controller-runtime@v0.15.0...v0.15.1) --- updated-dependencies: - dependency-name: sigs.k8s.io/controller-runtime dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>

Bumps [golang.org/x/sys](https://github.com/golang/sys) from 0.10.0 to 0.11.0. - [Commits](golang/sys@v0.10.0...v0.11.0) --- updated-dependencies: - dependency-name: golang.org/x/sys dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>

Signed-off-by: Zack Train <ztrain@uber.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>

Bumps [golang.org/x/net](https://github.com/golang/net) from 0.13.0 to 0.14.0. - [Commits](golang/net@v0.13.0...v0.14.0) --- updated-dependencies: - dependency-name: golang.org/x/net dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>

…ncy telemetry util (spiffe#4399) * Add telemetry instrumentation for delegated identity API and add latency telemetry util Signed-off-by: chiragk25 <chirag.d.kapadia@gmail.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>

…spiffe#4416) Bumps [github.com/aws/aws-sdk-go-v2/service/ec2](https://github.com/aws/aws-sdk-go-v2) from 1.109.1 to 1.110.1. - [Release notes](https://github.com/aws/aws-sdk-go-v2/releases) - [Commits](aws/aws-sdk-go-v2@service/ec2/v1.109.1...service/ec2/v1.110.1) --- updated-dependencies: - dependency-name: github.com/aws/aws-sdk-go-v2/service/ec2 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>

Bumps [actions/setup-go](https://github.com/actions/setup-go) from 4.0.1 to 4.1.0. - [Release notes](https://github.com/actions/setup-go/releases) - [Commits](actions/setup-go@fac708d...93397be) --- updated-dependencies: - dependency-name: actions/setup-go dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>

Bumps [google.golang.org/api](https://github.com/googleapis/google-api-go-client) from 0.134.0 to 0.136.0. - [Release notes](https://github.com/googleapis/google-api-go-client/releases) - [Changelog](https://github.com/googleapis/google-api-go-client/blob/main/CHANGES.md) - [Commits](googleapis/google-api-go-client@v0.134.0...v0.136.0) --- updated-dependencies: - dependency-name: google.golang.org/api dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>

Bumps [github.com/sigstore/sigstore](https://github.com/sigstore/sigstore) from 1.7.1 to 1.7.2. - [Release notes](https://github.com/sigstore/sigstore/releases) - [Commits](sigstore/sigstore@v1.7.1...v1.7.2) --- updated-dependencies: - dependency-name: github.com/sigstore/sigstore dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>

This project generates releases by just creating a new release branch without a corresponding semver tag, and changing the major version tag to point to the release branch, which isn't enough for dependabot to automatically detect the new versions, see msys2/setup-msys2#327 Manually update this step for now to the current commit pointed to by the `v2` tag (`v2.20.0`): https://github.com/msys2/setup-msys2/tree/v2 Signed-off-by: Ryan Turner <turner@uber.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>

Bumps [k8s.io/kube-aggregator](https://github.com/kubernetes/kube-aggregator) from 0.27.4 to 0.28.0. - [Commits](kubernetes/kube-aggregator@v0.27.4...v0.28.0) --- updated-dependencies: - dependency-name: k8s.io/kube-aggregator dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>

Quite some time ago we added a scan to first warn and then eventually delete entries with invalid SPIFFE IDs. This scan is no longer needed, since entries will have already been removed by previous upgrades and can be removed. Signed-off-by: Andrew Harding <azdagron@gmail.com> Co-authored-by: Marcos Yacob <marcos.yacob@hpe.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>

Signed-off-by: Faisal Memon <fymemon@yahoo.com>

Signed-off-by: Faisal Memon <fymemon@yahoo.com> Co-authored-by: Marcos Yacob <marcos.yacob@hpe.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>

Signed-off-by: Faisal Memon <fymemon@yahoo.com>

Signed-off-by: Faisal Memon <fymemon@yahoo.com> Co-authored-by: Marcos Yacob <marcos.yacob@hpe.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>

Signed-off-by: Faisal Memon <fymemon@yahoo.com>

faisal-memon · 2023-09-12T23:15:13Z

DCO fixed and branch up to date with latest.

MarcosDY · 2023-09-13T15:18:24Z

pkg/server/datastore/sqlstore/sqlstore.go

+func listRegistrationEntriesEvents(tx *gorm.DB, req *datastore.ListRegistrationEntriesEventsRequest) (*datastore.ListRegistrationEntriesEventsResponse, error) {
+	var events []RegisteredEntryEvent


Since this is experimental it must be called only when feature flag is enabled

Suggested change

func listRegistrationEntriesEvents(tx *gorm.DB, req *datastore.ListRegistrationEntriesEventsRequest) (*datastore.ListRegistrationEntriesEventsResponse, error) {

var events []RegisteredEntryEvent

func listRegistrationEntriesEvents(tx *gorm.DB, req *datastore.ListRegistrationEntriesEventsRequest) (*datastore.ListRegistrationEntriesEventsResponse, error) {

if !fflag.IsSet(fflag.FlagEventsBasedCache) {

return &datastore.ListRegistrationEntriesEventsRespons{}, nil

}

var events []RegisteredEntryEvent

MarcosDY · 2023-09-13T15:19:12Z

pkg/server/datastore/sqlstore/sqlstore.go

+func pruneRegistrationEntriesEvents(tx *gorm.DB, olderThan time.Duration) error {
+	if err := tx.Where("created_at < ?", time.Now().Add(-olderThan)).Delete(&RegisteredEntryEvent{}).Error; err != nil {


Since this is experimental it must be called only when feature flag is enabled

Suggested change

func pruneRegistrationEntriesEvents(tx *gorm.DB, olderThan time.Duration) error {

if err := tx.Where("created_at < ?", time.Now().Add(-olderThan)).Delete(&RegisteredEntryEvent{}).Error; err != nil {

func pruneRegistrationEntriesEvents(tx *gorm.DB, olderThan time.Duration) error {

if !fflag.IsSet(fflag.FlagEventsBasedCache) {

return nil

}

if err := tx.Where("created_at < ?", time.Now().Add(-olderThan)).Delete(&RegisteredEntryEvent{}).Error; err != nil {

Signed-off-by: Marcos Yacob <marcos.yacob@hpe.com>

faisal-memon requested review from evan2645, amartinezfayo, azdagron, MarcosDY and rturner3 as code owners August 4, 2023 20:47

rturner3 assigned azdagron Aug 8, 2023

amartinezfayo reviewed Aug 9, 2023

View reviewed changes

cmd/spire-server/cli/run/run.go Outdated Show resolved Hide resolved

MarcosDY reviewed Aug 15, 2023

View reviewed changes

azdagron reviewed Aug 17, 2023

View reviewed changes

pkg/server/datastore/sqlstore/sqlstore.go Outdated Show resolved Hide resolved

pkg/server/datastore/sqlstore/sqlstore.go Outdated Show resolved Hide resolved

evan2645 added this to the 1.8.0 milestone Aug 22, 2023

evan2645 mentioned this pull request Aug 22, 2023

New Mutable Authorized Entry Cache #4451

Merged

faisal-memon commented Aug 24, 2023

View reviewed changes

pkg/server/datastore/sqlstore/sqlstore.go Outdated Show resolved Hide resolved

azdagron reviewed Aug 31, 2023

View reviewed changes

pkg/server/datastore/sqlstore/sqlstore.go Show resolved Hide resolved

MarcosDY reviewed Sep 4, 2023

View reviewed changes

stevend-uber mentioned this pull request Sep 11, 2023

ListEntries times out for ~700k+ registrations. #4488

Closed

faisal-memon and others added 11 commits September 12, 2023 15:32

Populate and prune entry event table

716fde0

Signed-off-by: Faisal Memon <fymemon@yahoo.com>

[docker buildx] Create tls context if needed (spiffe#4405)

bbd4621

Signed-off-by: Zack Train <ztrain@uber.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>

dependabot bot and others added 16 commits September 12, 2023 15:32

Clean up fflag in sqlstore.go

b5276d4

Signed-off-by: Faisal Memon <fymemon@yahoo.com>

Remove mysql specifics

aaa845e

Signed-off-by: Faisal Memon <fymemon@yahoo.com>

Update pruning interval

b136c60

Signed-off-by: Faisal Memon <fymemon@yahoo.com>

Add prune unit tests, sql lite support

59d25b7

Signed-off-by: Faisal Memon <fymemon@yahoo.com>

Make prune test more resillient

eff8bb3

Signed-off-by: Faisal Memon <fymemon@yahoo.com>

Fix postgres issues with pruning

12767b8

Signed-off-by: Faisal Memon <fymemon@yahoo.com>

Add more unit tests for listing events

1e9f012

Signed-off-by: Faisal Memon <fymemon@yahoo.com>

Add test for prune events task

939ad99

Signed-off-by: Faisal Memon <fymemon@yahoo.com>

Remove pagination

7aa0feb

Signed-off-by: Faisal Memon <fymemon@yahoo.com>

Remove platform specific prune functions

bce8751

Signed-off-by: Faisal Memon <fymemon@yahoo.com>

Update pkg/common/telemetry/server/datastore/event.go

dbcd0a8

Signed-off-by: Faisal Memon <fymemon@yahoo.com> Co-authored-by: Marcos Yacob <marcos.yacob@hpe.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>

Add unit tests for configurable

c1296d0

Signed-off-by: Faisal Memon <fymemon@yahoo.com>

Update pkg/server/endpoints/entryfetcher_test.go

a2aaa2a

Signed-off-by: Faisal Memon <fymemon@yahoo.com> Co-authored-by: Marcos Yacob <marcos.yacob@hpe.com> Signed-off-by: Faisal Memon <fymemon@yahoo.com>

Add ctx.Done()

094a8ad

Signed-off-by: Faisal Memon <fymemon@yahoo.com>

faisal-memon force-pushed the db-api branch from 9ae1846 to 094a8ad Compare September 12, 2023 22:32

faisal-memon added 2 commits September 12, 2023 16:07

Merge branch 'main' into db-api

bdfae23

Fix some rebase conflicts

d8f9649

Signed-off-by: Faisal Memon <fymemon@yahoo.com>

MarcosDY previously approved these changes Sep 13, 2023

View reviewed changes

MarcosDY reviewed Sep 13, 2023

View reviewed changes

run prune events only when feature flag is active

df60b25

Signed-off-by: Marcos Yacob <marcos.yacob@hpe.com>

MarcosDY dismissed their stale review via df60b25 September 13, 2023 18:34

azdagron approved these changes Sep 13, 2023

View reviewed changes

MarcosDY and others added 2 commits September 13, 2023 16:16

Merge branch 'main' into db-api

79c2e75

Merge branch 'main' into db-api

117f56c

rturner3 merged commit 7a5a528 into spiffe:main Sep 13, 2023

azdagron mentioned this pull request Sep 15, 2023

Dynamic authorized entry cache #4498

Closed

7 tasks

faisal-memon deleted the db-api branch September 26, 2023 21:25

		func listRegistrationEntriesEvents(tx gorm.DB, req datastore.ListRegistrationEntriesEventsRequest) (*datastore.ListRegistrationEntriesEventsResponse, error) {
		var events []RegisteredEntryEvent

		func pruneRegistrationEntriesEvents(tx *gorm.DB, olderThan time.Duration) error {
		if err := tx.Where("created_at < ?", time.Now().Add(-olderThan)).Delete(&RegisteredEntryEvent{}).Error; err != nil {

Populate and prune entry event table #4411

Populate and prune entry event table #4411

Conversation

faisal-memon commented Aug 4, 2023

edwbuck commented Aug 9, 2023

azdagron commented Aug 9, 2023

azdagron commented Aug 10, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

faisal-memon Aug 30, 2023 • edited Loading

Choose a reason for hiding this comment

edwbuck Aug 31, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

faisal-memon commented Aug 31, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

faisal-memon Sep 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

faisal-memon commented Sep 12, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

faisal-memon Aug 30, 2023 •

edited

Loading

edwbuck Aug 31, 2023 •

edited

Loading

faisal-memon Sep 9, 2023 •

edited

Loading