-
Notifications
You must be signed in to change notification settings - Fork 743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: adding unit tests for dapr and updating dapr sdk version #2846
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #2846 +/- ##
==========================================
- Coverage 53.60% 53.07% -0.54%
==========================================
Files 133 135 +2
Lines 11545 11790 +245
==========================================
+ Hits 6189 6257 +68
- Misses 4880 5047 +167
- Partials 476 486 +10
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
pkg/audit/manager.go
Outdated
@@ -253,7 +253,7 @@ func (am *Manager) audit(ctx context.Context) error { | |||
|
|||
err := am.addAuditResponsesToUpdateLists(updateLists, res, totalViolationsPerConstraint, totalViolationsPerEnforcementAction, timestamp) | |||
if errs != nil { | |||
return err | |||
am.log.Error(err, "Auditing") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we squashing errors here and below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before the pub-sub change addAuditResponsesToUpdateLists
didn't really generate any error. By returning "Publishing" err
and not "Audit" errors at line 256 and below, we are disrupting the audit, which results in constraintStatus
not being updated with violations.
I am also open to ideas on handling publishing errors in other ways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we are not squashing pubsub errors, we are squashing any error returned by addAuditResponsesToUpdateLists()
, which seems much broader in scope than what you're intending (merely swallowing pubsub errors).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, anticipating another comment:
What is our contract with users with regards to failing to publish an audit message?
The way this is written, it appears we are giving up on first failure, even if there is something like a transient networking issue.
What is our contract with users? When will events fail to be published?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we are not squashing pubsub errors, we are squashing any error returned by addAuditResponsesToUpdateLists(), which seems much broader in scope than what you're intending (merely swallowing pubsub errors).
before pub-sub change, addAuditResponsesToUpdateLists
didn't generate any error at all, it returned nil
. So any errors returned now are only errors encountered in publishing the message. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few chains of thought -
- Popular pubsub tools might already have a retry mechanism in place. Can we assume that when a driver fails to publish a message, the underlying provider/tool has exhausted the retries?
- Implementing retries on top of tool's retries might cause overhead for the messages that are not getting delivered because of legit errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay in looking at this.
@maxsmythe i think @JaydipGabani's point is addAuditResponsesToUpdateLists
never returned an err before pubsub (it had the return type as an error
, which is a bug I believe - let me know if I am missing something) https://github.com/open-policy-agent/gatekeeper/blob/v3.10.0/pkg/audit/manager.go#L693
So previously (pre pubsub), audit was fault tolerant and we always published to constraints
however, now that pubsub is actually returning an error as part of addAuditResponsesToUpdateLists
this would have unintended consequences of actually returning an error and skipping the constraint update.
I think @JaydipGabani's change is intending to restore the previous behavior so we still update constraints even if there is an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at some point we stopped returning errors b/c it was poorly-behaved. In general, audit code could probably use a refactor at some point. The core point is that we probably shouldn't be swallowing all errors unless it is part of some principled design for audit (which, again, we should do at some point).
If all we are doing is avoiding rocking the boat, then we should swallow pubsub failures close to the source. Anything grander is getting into redesign territory, and requires an understanding of audit as a whole to write/review.
Regardless, the larger point of this discussion thread is: what are the user expectations with regard to failed pubsub requests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, the user expectation should be we return as many violations as possible even when we encounter errors. The audit process should be fault tolerant in that we log the errors, but an error should not impact all violations.
/benchmark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after mutex nit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few nits and needs rebase, otherwise LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few nits and needs rebase, otherwise LGTM
contrib.go.opencensus.io/exporter/ocagent v0.7.0 | ||
contrib.go.opencensus.io/exporter/prometheus v0.4.2 | ||
contrib.go.opencensus.io/exporter/stackdriver v0.13.14 | ||
github.com/dapr/go-sdk v1.6.0 | ||
github.com/dapr/go-sdk v1.8.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to start documenting dapr version compatibilities for each version of GK. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I could add it to doc PR 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any specific place in mind for this information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For docs, might be good to add to the pubsub dapr provider docs. But we need to remember to update that doc whenever this dependency is bumped. Not very ideal.
For the daprClient, is there any minimum required version we need to test? Does the dapr sdk return its own version? If so, might be good to include that in the log as the daprClient is created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As follow up when we implement batch publishing, 1.8.0
will be the minimum required version, and dapr
1.11 will be the minimum compatible version.
Does the dapr sdk return its own version?
yes it does, but I am not sure how user can figure out which dapr runtime to install before hand if daprClient version information is going to be logged when user has installed all prerequisites and trying to initiate a connection using dapr.
Can we just say in the provider doc that, "current dapr sdk version is [link to gk go mod file]" and point to dapr-version for more information?
Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>
Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>
Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>
Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>
Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>
Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>
Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>
Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>
Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>
Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What this PR does / why we need it: Adding unit tests for pubsub system and dapr driver
Which issue(s) this PR fixes (optional, using
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when the PR gets merged):Fixes #2800
Special notes for your reviewer: