-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ops: add ttl metrics for certificate expiration #130110
ops: add ttl metrics for certificate expiration #130110
Conversation
) | ||
|
||
func makeMetrics() Metrics { | ||
func expirationGauge(metadata metric.Metadata, ci *CertInfo) *metric.Gauge { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these lock the certificate manager while they're being read?
// the correct expiration and ttl values are set on the metrics vars that are | ||
// exposed to our collectors. It uses a single key pair to approximate the | ||
// behavior across other key pairs. | ||
func TestMetricsValues(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only testing the CA certs for TTL and expiration, open to going beyond that as well
d1d4eaa
to
029bb3f
Compare
@@ -100,6 +100,8 @@ func (p PemUsage) String() string { | |||
return "Client" | |||
case TenantPem: | |||
return "Tenant Client" | |||
case TenantSigningPem: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing string value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Overall changes look good, just some small comments on naming and commit hygiene.
Reviewed 5 of 6 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @angles-n-daemons)
-- commits
line 2 at r2:
Please add the headling ops:...
to the commit message too.
-- commits
line 7 at r2:
I'm a little confused about this release note - not sure if this security:
part is misleading. It seems like this release note is more suitable as the commit message body. We don't need to add any comments on refactoring in release notes - they are for documenting user facing changes.
https://cockroachlabs.atlassian.net/wiki/spaces/CRDB/pages/73072807/Git+Commit+Messages
pkg/security/certificate_metrics.go
line 175 at r1 (raw file):
Previously, angles-n-daemons (Brian Dillmann) wrote…
Should these lock the certificate manager while they're being read?
This fn seems fine, as it's not actually reading from the cm directly and the ci passed in isn't expected to change. Left a comment on the caller of this fn regarding the locking.
pkg/security/certificate_metrics.go
line 132 at r2 (raw file):
metaClientTTL = metric.Metadata{ Name: "security.certificate.ttl.client", Help: "Seconds till expiration for the client certificates., labeled by SQL user. 0 means expired, no certificate or error.",
nit: extra .
after certificates
pkg/security/certificate_metrics.go
line 202 at r2 (raw file):
// metric to zero. // cm.mu must be held to protect the certificates. Metrics do their own atomicity. func (cm *CertificateManager) createMetrics() Metrics {
It seems like this has to be called when the cm is locked since it's accessing fields protected under mu
. Do you mind just renaming this to createMetricsLocked
to signal that? I see that it's being called in LoadCertificates which locks it so the usages are good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 6 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @angles-n-daemons and @xinhaoz)
pkg/security/certificate_manager.go
line 115 at r2 (raw file):
tenantIdentifier uint64 // timeSource, if set, specifies the time source with which the metrics are set
nit
Suggestion:
set.
pkg/security/certificate_manager.go
line 132 at r2 (raw file):
// WithTimeSource allows the caller to pass a time source to be used // by the Metrics struct (mostly for testing)
nit
Suggestion:
(mostly for testing).
pkg/security/certificate_metrics.go
line 160 at r2 (raw file):
Unit: metric.Unit_TIMESTAMP_SEC, }
nit: extra new line
pkg/security/certificate_loader_test.go
line 130 at r2 (raw file):
} var defaultExpiration = timeutil.Now().Add(time.Hour)
Should this initialization happen within the test?
This value might get set way before the test using it is actually being executed.
pkg/security/certificate_metrics_test.go
line 46 at r2 (raw file):
{"ca.crt", 0777, caCert}, {"ca.key", 0777, []byte("ca.key")}, } {
Could this also have the other node/client/tenant certificates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @pritesh-lahoti and @xinhaoz)
Previously, xinhaoz (Xin Hao Zhang) wrote…
Please add the headling
ops:...
to the commit message too.
ah makes sense, will do
Previously, xinhaoz (Xin Hao Zhang) wrote…
I'm a little confused about this release note - not sure if this
security:
part is misleading. It seems like this release note is more suitable as the commit message body. We don't need to add any comments on refactoring in release notes - they are for documenting user facing changes.
https://cockroachlabs.atlassian.net/wiki/spaces/CRDB/pages/73072807/Git+Commit+Messages
gotcha, thanks for pointing this out
pkg/security/certificate_manager.go
line 115 at r2 (raw file):
Previously, pritesh-lahoti wrote…
nit
👀
pkg/security/certificate_manager.go
line 132 at r2 (raw file):
Previously, pritesh-lahoti wrote…
nit
👀
pkg/security/certificate_metrics.go
line 132 at r2 (raw file):
Previously, xinhaoz (Xin Hao Zhang) wrote…
nit: extra
.
after certificates
👀
pkg/security/certificate_metrics.go
line 160 at r2 (raw file):
Previously, pritesh-lahoti wrote…
nit: extra new line
👀
pkg/security/certificate_metrics.go
line 202 at r2 (raw file):
Previously, xinhaoz (Xin Hao Zhang) wrote…
It seems like this has to be called when the cm is locked since it's accessing fields protected under
mu
. Do you mind just renaming this tocreateMetricsLocked
to signal that? I see that it's being called in LoadCertificates which locks it so the usages are good!
make sense! I'll make that naming change, definitely seems better
pkg/security/certificate_loader_test.go
line 130 at r2 (raw file):
Previously, pritesh-lahoti wrote…
Should this initialization happen within the test?
This value might get set way before the test using it is actually being executed.
What's your preference? Right now this primarily used as a noop
pkg/security/certificate_metrics_test.go
line 46 at r2 (raw file):
Previously, pritesh-lahoti wrote…
Could this also have the other node/client/tenant certificates?
yeah I'll give this a go
029bb3f
to
842b982
Compare
Missing: update on the client certificate TTL aggregate gauge. Need to add that before merging |
842b982
to
5687239
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 3 files at r3, 4 of 4 files at r4, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aa-joshi, @angles-n-daemons, @arjunmahishi, and @xinhaoz)
pkg/security/cert_expiry_cache.go
line 123 at r4 (raw file):
// GetTTL retrieves seconds till cert expiration for the given username, if it exists. // An expiration of 0 indicates an entry was not found.
Suggestion:
A TTL of 0
pkg/security/cert_expiry_cache.go
line 224 at r4 (raw file):
// timeNow returns the current time depending on the time source of the cache. func (c *ClientCertExpirationCache) timeNow() int64 {
nit: extra line
pkg/util/metric/aggmetric/gauge.go
line 183 at r4 (raw file):
} // Updates the function on the gauge
nit
Suggestion:
// UpdateFn updates the function on the gauge.
pkg/security/certificate_loader_test.go
line 130 at r2 (raw file):
Previously, angles-n-daemons (Brian Dillmann) wrote…
What's your preference? Right now this primarily used as a noop
I believe moving it within the test will be resilient to potential flakiness.
Currently, cockroach only exposes point in time certificate expiration metrics. If the certificate is to expire 1 day from now, we expose a gauge `security.certificate.expiration.<cert-type>` which is the unix timestamp when it will expire. This PR also exposes a ttl metric `security.certificate.ttl.<cert-type>` so that consumers of this information can run operations based on their distance to certificate expiration without additional transformations. Additionally, this PR refactors how the expiration gauges are set, so that reads of the gauge directly reference the value of the certificate. Epic: CRDB-40209 Fixes: cockroachdb#77376 Release note (ops change): new metrics which expose the ttl for various certificates
5687239
to
c830973
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for all the changes!
Reviewed 3 of 3 files at r5, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @aa-joshi, @angles-n-daemons, @arjunmahishi, and @xinhaoz)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bors r+ |
ops: add ttl metrics for certificate expiration
Currently, cockroach only exposes point in time certificate
expiration metrics. If the certificate is to expire 1 day from now,
we expose a gauge
security.certificate.expiration.<cert-type>
which is the unix timestamp when it will expire. This PR also
exposes a ttl metric
security.certificate.ttl.<cert-type>
so thatconsumers of this information can run operations based on their
distance to certificate expiration without additional
transformations.
Additionally, this PR refactors how the expiration gauges are set,
so that reads of the gauge directly reference the value of the
certificate.
Epic: CRDB-40209
Fixes: #77376
Release note (ops change): new metrics which expose the ttl for various
certificates