-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move cert loading failure assertion on client public key write #2655
Move cert loading failure assertion on client public key write #2655
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2655 +/- ##
==========================================
+ Coverage 95.09% 95.10% +0.01%
==========================================
Files 213 213
Lines 9491 9491
==========================================
+ Hits 9025 9026 +1
Misses 388 388
+ Partials 78 77 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the comment on
// If mismatched keys are present in the cert, the "Failed to load certificate" error will be logged instead. |
@@ -99,15 +99,7 @@ func TestReload(t *testing.T) { | |||
FilterField(zap.String("certificate", keyFile.Name())).Len() > 0 | |||
}, 100, time.Millisecond*200) | |||
|
|||
// Logged when the cert is modified with the client's public key due to | |||
// a mismatch with the existing server private key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds to me that if this doesn't always happen then we have a flaky test conditions that would result in some code branches not executed reliably. The assertion was meant to ensure that they are always executed. If we remove the assertion, it could make the test results more stable, but the code coverage less stable, which is just as bad.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the record, the usual reason for flaky test behavior is heavier CPU preemption in the shared CI environments. I see that the wait above is only for 200ms, it's quite easy for the run to hiccup for longer than that, I usually make sure the wait is at least a second. The ideal solution is to make the test conditions deterministic, i.e. if we certain that the condition will happen, then we can safely use longer waits, like 10sec, since it's unlikely the CI run will be CPU starved for that long while at the same time the long waits won't make the test run longer under normal circumstances (especially locally).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the logged error is expected, then the comment I referenced is incorrect. However there is another test that covers the mismatch case that does check for that logged error.
However agree that increasing the wait may have been a good first step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The wait above is 200ms x 100 iterations = 20sec, which is more than enough time I think.
@objectiser is right that there is existing line coverage for the public/private key mismatch case under the TestReload_err_cert_update
test.
The problem as it stands is that we're making two file writes in succession then asserting on both states. If we do, however, want to have a stronger assertion on the behaviour of cert reloads, why don't we assert each state after each file write?
That is, move the failure assertion up, after writing the client's public key to the certFile
; which should give enough time for the certWatcher to pick up this change in state.
That is:
// Write the client's public key
certData, err = ioutil.ReadFile(clientCert)
require.NoError(t, err)
err = syncWrite(certFile.Name(), certData, 0644)
require.NoError(t, err)
// Wait for the cert watcher to log an error due to mismatching private/public keys
waitUntil(func() bool {
// Logged when the cert is modified with the client's public key due to
// a mismatch with the existing server private key.
return logObserver.FilterMessage("Failed to load certificate").
FilterField(zap.String("certificate", certFile.Name())).Len() > 0
}, 100, time.Millisecond*200)
assert.True(t, logObserver.
FilterMessage("Failed to load certificate").
FilterField(zap.String("certificate", certFile.Name())).Len() > 0,
"Failed to find wanted logs. All logs: "+fmt.Sprint(logObserver.All()))
// Write the client's private key
keyData, err = ioutil.ReadFile(clientKey)
require.NoError(t, err)
err = syncWrite(keyFile.Name(), keyData, 0644)
require.NoError(t, err)
... // wait for "Loaded modified certificate" as below
What do you think @yurishkuro @objectiser?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although the main state this test is interested in is the "loaded modified certificate", so checking for the failure is not strictly necessary, doing it this way may help users examining the test logs that find the error, as they would see the test was anticipating it. So +1 from me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback @objectiser & @yurishkuro, I've moved the failure assertion to after writing the client's public key as suggested.
1a24f20
to
21abf3d
Compare
Pull request has been modified.
Signed-off-by: albertteoh <albert.teoh@logz.io>
Signed-off-by: albertteoh <albert.teoh@logz.io>
5a465f8
to
719c74b
Compare
assert.True(t, logObserver. | ||
FilterMessage("Failed to load certificate"). | ||
FilterField(zap.String("certificate", certFile.Name())).Len() > 0, | ||
"Failed to find wanted logs. All logs: "+fmt.Sprint(logObserver.All())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor point - could you differentiate this log message from the one at the end of this function - e.g. "Unable to locate 'Failed to load certificate' error in log"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, good pickup, @objectiser.
Signed-off-by: albertteoh <albert.teoh@logz.io>
fdc4e3e
to
58bfc88
Compare
Signed-off-by: albertteoh <albert.teoh@logz.io>
796731a
to
b114ab2
Compare
Pull request has been modified.
// Logged when the cert is reloaded with mismatching client public key and existing server private key. | ||
return logObserver.FilterMessage("Failed to load certificate"). | ||
FilterField(zap.String("certificate", certFile.Name())).Len() > 0 | ||
}, 100, time.Millisecond*200) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: instead of 100 * 200ms, I would do 2000 * 10ms - minimize the guaranteed delay, keep total delay the same (absolute # is not particularly important, 20s is probably overkill, but 10s would be fine, enough to outlast any scheduling hiccup).
assert.True(t, logObserver. | ||
FilterMessage("Failed to load certificate"). | ||
FilterField(zap.String("certificate", certFile.Name())).Len() > 0, | ||
"Unable to locate 'Failed to load certificate' in log. All logs: "+fmt.Sprint(logObserver.All())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: assert uses Printf syntax, you could do
"Unable to locate 'Failed to load certificate' in log. All logs: %v", logObserver.All())
Signed-off-by: albertteoh <albert.teoh@logz.io>
Signed-off-by: albertteoh albert.teoh@logz.io
Which problem is this PR solving?
Short description of the changes
"Failed to load certificate"
is logged every time; however some tests in CI are not logging this failure state, perhaps due to disk I/O variability.waitUntil
condition which should make this test more reliable.