Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: adding unit tests for dapr and updating dapr sdk version #2846

Merged
merged 10 commits into from
Jul 24, 2023

Conversation

JaydipGabani
Copy link
Contributor

What this PR does / why we need it: Adding unit tests for pubsub system and dapr driver

Which issue(s) this PR fixes (optional, using fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when the PR gets merged):
Fixes #2800

Special notes for your reviewer:

@codecov-commenter
Copy link

codecov-commenter commented Jun 22, 2023

Codecov Report

Patch coverage: 10.33% and project coverage change: -0.54 ⚠️

Comparison is base (1076798) 53.60% compared to head (31afce7) 53.07%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2846      +/-   ##
==========================================
- Coverage   53.60%   53.07%   -0.54%     
==========================================
  Files         133      135       +2     
  Lines       11545    11790     +245     
==========================================
+ Hits         6189     6257      +68     
- Misses       4880     5047     +167     
- Partials      476      486      +10     
Flag Coverage Δ
unittests 53.07% <10.33%> (-0.54%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/audit/manager.go 9.83% <0.00%> (+0.19%) ⬆️
pkg/controller/pubsub/pubsub_config_controller.go 11.68% <ø> (ø)
pkg/pubsub/provider/fake_provider.go 0.00% <0.00%> (ø)
pkg/pubsub/dapr/fake_dapr_client.go 9.37% <9.37%> (ø)
pkg/pubsub/system.go 73.07% <50.00%> (+69.15%) ⬆️
pkg/pubsub/dapr/dapr.go 69.04% <100.00%> (+45.23%) ⬆️

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

pkg/pubsub/system_test.go Show resolved Hide resolved
pkg/pubsub/system_test.go Show resolved Hide resolved
@@ -253,7 +253,7 @@ func (am *Manager) audit(ctx context.Context) error {

err := am.addAuditResponsesToUpdateLists(updateLists, res, totalViolationsPerConstraint, totalViolationsPerEnforcementAction, timestamp)
if errs != nil {
return err
am.log.Error(err, "Auditing")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we squashing errors here and below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the pub-sub change addAuditResponsesToUpdateLists didn't really generate any error. By returning "Publishing" err and not "Audit" errors at line 256 and below, we are disrupting the audit, which results in constraintStatus not being updated with violations.

I am also open to ideas on handling publishing errors in other ways.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we are not squashing pubsub errors, we are squashing any error returned by addAuditResponsesToUpdateLists(), which seems much broader in scope than what you're intending (merely swallowing pubsub errors).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, anticipating another comment:

What is our contract with users with regards to failing to publish an audit message?

The way this is written, it appears we are giving up on first failure, even if there is something like a transient networking issue.

What is our contract with users? When will events fail to be published?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we are not squashing pubsub errors, we are squashing any error returned by addAuditResponsesToUpdateLists(), which seems much broader in scope than what you're intending (merely swallowing pubsub errors).

before pub-sub change, addAuditResponsesToUpdateLists didn't generate any error at all, it returned nil. So any errors returned now are only errors encountered in publishing the message. 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ritazh @sozercan would be helpful to have your voice as to user expectations here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few chains of thought -

  1. Popular pubsub tools might already have a retry mechanism in place. Can we assume that when a driver fails to publish a message, the underlying provider/tool has exhausted the retries?
  2. Implementing retries on top of tool's retries might cause overhead for the messages that are not getting delivered because of legit errors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in looking at this.

@maxsmythe i think @JaydipGabani's point is addAuditResponsesToUpdateLists never returned an err before pubsub (it had the return type as an error, which is a bug I believe - let me know if I am missing something) https://github.com/open-policy-agent/gatekeeper/blob/v3.10.0/pkg/audit/manager.go#L693
So previously (pre pubsub), audit was fault tolerant and we always published to constraints

however, now that pubsub is actually returning an error as part of addAuditResponsesToUpdateLists this would have unintended consequences of actually returning an error and skipping the constraint update.
I think @JaydipGabani's change is intending to restore the previous behavior so we still update constraints even if there is an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at some point we stopped returning errors b/c it was poorly-behaved. In general, audit code could probably use a refactor at some point. The core point is that we probably shouldn't be swallowing all errors unless it is part of some principled design for audit (which, again, we should do at some point).

If all we are doing is avoiding rocking the boat, then we should swallow pubsub failures close to the source. Anything grander is getting into redesign territory, and requires an understanding of audit as a whole to write/review.

Regardless, the larger point of this discussion thread is: what are the user expectations with regard to failed pubsub requests?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, the user expectation should be we return as many violations as possible even when we encounter errors. The audit process should be fault tolerant in that we log the errors, but an error should not impact all violations.

pkg/controller/pubsub/pubsub_config_controller.go Outdated Show resolved Hide resolved
@JaydipGabani
Copy link
Contributor Author

/benchmark

@github-actions
Copy link
Contributor

github-actions bot commented Jul 7, 2023

@acpana acpana changed the title chore: adding unit tests for dapr and updating dapr sdk version test: adding unit tests for dapr and updating dapr sdk version Jul 10, 2023
@ritazh ritazh added this to the v3.13.0 milestone Jul 12, 2023
pkg/pubsub/system.go Outdated Show resolved Hide resolved
pkg/audit/manager.go Outdated Show resolved Hide resolved
Copy link
Contributor

@maxsmythe maxsmythe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after mutex nit

pkg/pubsub/system_test.go Outdated Show resolved Hide resolved
pkg/audit/manager.go Outdated Show resolved Hide resolved
pkg/pubsub/system_test.go Outdated Show resolved Hide resolved
Copy link
Member

@sozercan sozercan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few nits and needs rebase, otherwise LGTM

Copy link
Member

@sozercan sozercan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few nits and needs rebase, otherwise LGTM

contrib.go.opencensus.io/exporter/ocagent v0.7.0
contrib.go.opencensus.io/exporter/prometheus v0.4.2
contrib.go.opencensus.io/exporter/stackdriver v0.13.14
github.com/dapr/go-sdk v1.6.0
github.com/dapr/go-sdk v1.8.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to start documenting dapr version compatibilities for each version of GK. wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I could add it to doc PR 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any specific place in mind for this information?

Copy link
Member

@ritazh ritazh Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For docs, might be good to add to the pubsub dapr provider docs. But we need to remember to update that doc whenever this dependency is bumped. Not very ideal.

For the daprClient, is there any minimum required version we need to test? Does the dapr sdk return its own version? If so, might be good to include that in the log as the daprClient is created.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As follow up when we implement batch publishing, 1.8.0 will be the minimum required version, and dapr 1.11 will be the minimum compatible version.

Does the dapr sdk return its own version?

yes it does, but I am not sure how user can figure out which dapr runtime to install before hand if daprClient version information is going to be logged when user has installed all prerequisites and trying to initiate a connection using dapr.

Can we just say in the provider doc that, "current dapr sdk version is [link to gk go mod file]" and point to dapr-version for more information?

Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>
Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>
Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>
Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>
Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>
Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>
Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>
Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>
Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>
Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>
Copy link
Member

@ritazh ritazh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sozercan sozercan merged commit 5f04a2c into open-policy-agent:master Jul 24, 2023
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add unit tests to cover pubsub system and dapr driver
5 participants