-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAOS-7203 control: Add histogram support to Prometheus exporter #5382
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. No errors found by checkpatch.
c6c92f2
to
2b06d36
Compare
Bug-tracker data: |
1 similar comment
Bug-tracker data: |
Functional on EL 9 Test Results (old)135 tests ±0 130 ✅ - 1 1h 24m 30s ⏱️ - 7m 9s For more details on these errors, see this check. Results for commit 427bb06. ± Comparison against base commit 9894532. ♻️ This comment has been updated with latest results. |
Functional on EL 8.8 Test Results (old)135 tests ±0 130 ✅ - 1 1h 34m 45s ⏱️ + 5m 40s For more details on these errors, see this check. Results for commit 427bb06. ± Comparison against base commit 9894532. ♻️ This comment has been updated with latest results. |
2b06d36
to
540a8ca
Compare
Bug-tracker data: |
Bug-tracker data: |
Functional Hardware Medium Test Results (old)130 tests ±0 103 ✅ - 1 2h 12m 10s ⏱️ + 2m 18s For more details on these failures, see this check. Results for commit 427bb06. ± Comparison against base commit 9894532. ♻️ This comment has been updated with latest results. |
bfc9b03
to
9894532
Compare
540a8ca
to
1de99d7
Compare
Bug-tracker data: |
1de99d7
to
427bb06
Compare
Bug-tracker data: |
Bug-tracker data: |
Ticket title is 'Create standard DAOS dashboards for Grafana' |
Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-5382/11/execution/node/1177/log |
FYI - I think the dkey_akey_enum_punch.py failures are being fixed in #14301 |
Actually, sorry. That looks like a different issue |
5fdafd4
to
5ac09d7
Compare
Update the Prometheus exporter to support passthrough histograms from native DAOS telemetry format. Fixes a few bugs and inefficiencies in the native histogram implementation. Features: telemetry Required-githooks: true Change-Id: I7842cc48a107ec0ba0ec93472fb6684db7394d30 Signed-off-by: Michael MacDonald <mjmac@google.com>
5ac09d7
to
757bddc
Compare
Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-5382/14/testReport/ |
Note to reviewers... This is finally in a state where I think it's ready for landing. I reworked the patch a bit. The original version changed the fetch/update bytes counters to histograms in addition to adding all of the logic for exporting via Prometheus. I've reverted that part of it, which should make this patch less risky. Longer term, I would like to revisit that other change, but I'll do that separately so that we can evaluate it in isolation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. I just had a couple minor comments, which I wouldn't block on.
@@ -3200,6 +3226,8 @@ d_tm_init_histogram(struct d_tm_node_t *node, char *path, int num_buckets, | |||
rc = -DER_NO_SHMEM; | |||
goto failure; | |||
} | |||
histogram->dth_fast_bucketing = | |||
(multiplier == 2 && (initial_width & (initial_width - 1)) == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to describe this special case in a comment. Is this saying that the initial width is also a power of two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to me it reads "set dth_fast_bucketing iff mult is 2 and least-significant-bit of initial_width is set, is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a comment in a subsequent patch. The idea is easier to see with an example. Say initial_width
is 256. As a bit pattern, that value looks like this: 0b100000000
. Then, if we subtract 1 from it, we get this pattern: 0b11111111
. If we AND those together, we get 0. This only works for numbers that are clean powers of 2.
Even easier to see if we "stack" the two values:
100000000
11111111
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you do add the comment, it would be great to detail not only the meaning of the bitwise math (although maybe that would be most clarified by using a named helper function or macro), but also why fast bucketing only works in the case where everything is a power of two. I think detailing the principle behind it will be most helpful in the future.
if err != nil { | ||
log.Errorf("[%s]: failed to get histogram buckets", baseName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be useful to log the actual error here too?
cmpOpts := cmp.Options{ | ||
cmpopts.SortSlices(func(a, b string) bool { | ||
return a < b | ||
}), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! IMO we should be using this in all our unit tests instead of always sorting the values we return in production code. In most cases the order really doesn't matter. To that end, I think it would be nice to add some kind of utility function in the test
package for this cmp option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! nothing blocking
@@ -25,8 +25,11 @@ import ( | |||
"time" | |||
) | |||
|
|||
var _ StatsMetric = (*Duration)(nil) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing description comment for exported type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not an exported type, it's an assertion that Duration
implements StatsMetric
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment is referring to "Duration" type below
Values []uint64 | ||
} | ||
|
||
// Histogram provides access to histogram data associated with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unnecessary line break
@@ -3200,6 +3226,8 @@ d_tm_init_histogram(struct d_tm_node_t *node, char *path, int num_buckets, | |||
rc = -DER_NO_SHMEM; | |||
goto failure; | |||
} | |||
histogram->dth_fast_bucketing = | |||
(multiplier == 2 && (initial_width & (initial_width - 1)) == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to me it reads "set dth_fast_bucketing iff mult is 2 and least-significant-bit of initial_width is set, is that correct?
Update the Prometheus exporter to support passthrough histograms from native DAOS telemetry format. Fixes a few bugs and inefficiencies in the native histogram implementation. Signed-off-by: Michael MacDonald <mjmac@google.com>
… (#15484) Update the Prometheus exporter to support passthrough histograms from native DAOS telemetry format. Fixes a few bugs and inefficiencies in the native histogram implementation. Signed-off-by: Michael MacDonald <mjmac@google.com>
Update the Prometheus exporter to support passthrough histograms
from native DAOS telemetry format. Fixes a few bugs and inefficiencies
in the native histogram implementation.