Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: update flagd provider specs #1432

Merged
merged 19 commits into from
Oct 30, 2024
Merged

docs: update flagd provider specs #1432

merged 19 commits into from
Oct 30, 2024

Conversation

toddbaert
Copy link
Member

@toddbaert toddbaert commented Oct 28, 2024

This PR contains significant enhancements to flagd provider specs. If merged, I will be opening a bunch of issues to implement what's described here based on recon I've been doing with the existing implementations

Specifically:

  • adds retryGraceAttempts param, which defines the amount of stream retry attempts before provider moves from STALE to ERROR state
  • adds contextEnricher param, which defines mapping function for sync-metadata to evaluation context for in process providers (exists already in Java provider)
  • improves consistency between in-process and RPC stream reconnect behavior
  • simplifies provider doc and spec to remove duplication and improve readability

@toddbaert toddbaert requested a review from a team as a code owner October 28, 2024 18:43
@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Oct 28, 2024
Copy link

netlify bot commented Oct 28, 2024

Deploy Preview for polite-licorice-3db33c ready!

Name Link
🔨 Latest commit 075efb7
🔍 Latest deploy log https://app.netlify.com/sites/polite-licorice-3db33c/deploys/672275579bdd330008eb1037
😎 Deploy Preview https://deploy-preview-1432--polite-licorice-3db33c.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Comment on lines +126 to +127
'reference/specifications/rpc-providers.md': 'reference/specifications/providers.md#rpc-providers'
'reference/specifications/in-process-providers.md': 'reference/specifications/providers.md#in-process-providers'
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redirects from old pages.

Comment on lines +16 to +17
MD007:
indent: 4
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mkdocs doesn't render nested lists properly unless they are double indented (4 spaces) so I've added this rule.

| streamDeadlineMs | FLAGD_STREAM_DEADLINE_MS | deadline for streaming calls, useful as an application-layer keepalive | int | 600000 | rpc & in-process |
| retryBackoffMs | FLAGD_RETRY_BACKOFF_MS | initial backoff for stream retry | int | 1000 | rpc & in-process |
| retryBackoffMaxMs | FLAGD_RETRY_BACKOFF_MAX_MS | maximum backoff for stream retry | int | 120000 | rpc & in-process |
| retryGraceAttempts | FLAGD_RETRY_GRACE_ATTEMPTS | amount of stream retry attempts before provider moves from STALE to ERROR state | int | 5 | rpc & in-process |
Copy link
Member Author

@toddbaert toddbaert Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retryGraceAttempts is a new param, not yet implemented in any provider but I think this is nicer than the "silent first retry" we have in Java, and solves the same problem in a more sensible way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the default be 5? With exponential backoff this would mean 31s.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, 15-30 seemed correct to me, personally, since I think most services and infra would be able to cycle within that time. 5 was more or less based on that idea.

Open to other arguments here though!

Comment on lines 245 to 247
In-process flagd providers also inject any properties returned by the [sync-metadata RPC response](./protos.md#getmetadataresponse) into the context.
This allows for static properties defined in flagd to be added to in-process evaluations.
If only a subset of the sync-metadata response is desired to be injected into the evaluation context, you can use the define a mapping function with the `contextEnricher` option.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't yet actually support the addition of arbitrary props into the evaluation context in flagd itself. If this is merged, I will create an issue for that.

Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
Copy link
Member

@guidobrei guidobrei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @toddbaert for consolidating all the different provider implementations into this spec. 🥇

| streamDeadlineMs | FLAGD_STREAM_DEADLINE_MS | deadline for streaming calls, useful as an application-layer keepalive | int | 600000 | rpc & in-process |
| retryBackoffMs | FLAGD_RETRY_BACKOFF_MS | initial backoff for stream retry | int | 1000 | rpc & in-process |
| retryBackoffMaxMs | FLAGD_RETRY_BACKOFF_MAX_MS | maximum backoff for stream retry | int | 120000 | rpc & in-process |
| retryGraceAttempts | FLAGD_RETRY_GRACE_ATTEMPTS | amount of stream retry attempts before provider moves from STALE to ERROR state | int | 5 | rpc & in-process |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the default be 5? With exponential backoff this would mean 31s.

docs/reference/specifications/providers.md Outdated Show resolved Hide resolved
docs/reference/specifications/providers.md Outdated Show resolved Hide resolved
- RPC mode resolves `STALE` from cache where possible
- in-process mode resolves `STALE` from stored `flag set` rules
- on stream reconnection:
- emit `PROVIDER_READY` and `PROVIDER_CONFIGURATION_CHANGED`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it emit PROVIDER_CONFIGURATION_CHANGED if we reconnect and the config did not change in the meantime?

Copy link
Member Author

@toddbaert toddbaert Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, we don't know if the config has changed, since we could have missed change events, so we fire a change regardless to make sure any change handlers run; change handlers are only a hook to cause additional evaluations, so if no changes have actually happened, it's not problem (flag values will just be the same).

This is how the Java provider currently works. We could just only run READY in this case, but IMO that's risky since it is possible that a missed change event would never be detected and handlers which re-evaluate flags never run, so I consumer stays out of sync.

docs/reference/specifications/providers.md Outdated Show resolved Hide resolved

### Custom Name Resolution

Some implementations support [gRPC custom name resolution](https://grpc.io/docs/guides/custom-name-resolution/), and abstractions to introduce additional resolvers.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some implementations support...

In the provider spec we should clarify if this feature is optional. But if it's implemented it should be consistent across implementations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya the main difficulty is not all gRPC implementations actually have this feature: https://grpc.io/docs/guides/custom-name-resolution/#language-support

docs/reference/specifications/providers.md Outdated Show resolved Hide resolved
docs/reference/specifications/providers.md Outdated Show resolved Hide resolved
| retryBackoffMs | FLAGD_RETRY_BACKOFF_MS | initial backoff for stream retry | int | 1000 | rpc & in-process |
| retryBackoffMaxMs | FLAGD_RETRY_BACKOFF_MAX_MS | maximum backoff for stream retry | int | 120000 | rpc & in-process |
| retryGraceAttempts | FLAGD_RETRY_GRACE_ATTEMPTS | amount of stream retry attempts before provider moves from STALE to ERROR state | int | 5 | rpc & in-process |
| keepAliveTime | FLAGD_KEEP_ALIVE_TIME_MS | http 2 keepalive | long | 0 | rpc & in-process |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still have HTTP 2 keepAlive support?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The gRPC keepalive is just an HTTP 2 keepalive.

Though considering how it didn't help us that much, we could consider not adding it at all. WDYT?

docs/reference/specifications/providers.md Outdated Show resolved Hide resolved
toddbaert and others added 5 commits October 29, 2024 09:30
Co-authored-by: Guido Breitenhuber <157148191+guidobrei@users.noreply.github.com>
Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
Co-authored-by: Guido Breitenhuber <157148191+guidobrei@users.noreply.github.com>
Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
Co-authored-by: Guido Breitenhuber <157148191+guidobrei@users.noreply.github.com>
Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
Co-authored-by: Guido Breitenhuber <157148191+guidobrei@users.noreply.github.com>
Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
Co-authored-by: Guido Breitenhuber <157148191+guidobrei@users.noreply.github.com>
Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
toddbaert and others added 2 commits October 29, 2024 09:49
Co-authored-by: Guido Breitenhuber <157148191+guidobrei@users.noreply.github.com>
Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
Co-authored-by: Guido Breitenhuber <157148191+guidobrei@users.noreply.github.com>
Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
toddbaert and others added 2 commits October 29, 2024 09:59
Co-authored-by: Guido Breitenhuber <157148191+guidobrei@users.noreply.github.com>
Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
Co-authored-by: Guido Breitenhuber <157148191+guidobrei@users.noreply.github.com>
Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
@toddbaert
Copy link
Member Author

toddbaert commented Oct 29, 2024

Build failure is due to Trivvy rate limit.

Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
@@ -73,18 +35,19 @@ The lifecycle is summarized below:
- if stream connection fails or exceeds the time specified by `deadline`, abort initialization (SDK will emit `PROVIDER_ERROR`), and attempt to [reconnect](#stream-reconnection)
- while connected:
- flags are resolved according to resolver mode; either by calling evaluation RPCs, or by evaluating the stored `flag set` rules
- for RPC providers, flags resolved with `reason=STATIC` are [cached](#flag-evaluation-caching)
- for RPC providers, flags resolved with `reason=STATIC` are [cached](#flag-evaluation-caching)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is under "while connected" so I think it's fine as is.

Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
beeme1mr and others added 2 commits October 30, 2024 14:00
Signed-off-by: Todd Baert <todd.baert@dynatrace.com>
@toddbaert toddbaert merged commit a19cb42 into main Oct 30, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants