Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-7203 control: Add histogram support to Prometheus exporter #5382

Merged
merged 1 commit into from
Nov 9, 2024

Conversation

mjmac
Copy link
Contributor

@mjmac mjmac commented Apr 8, 2021

Update the Prometheus exporter to support passthrough histograms
from native DAOS telemetry format. Fixes a few bugs and inefficiencies
in the native histogram implementation.

@mjmac mjmac self-assigned this Apr 8, 2021
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

@mjmac mjmac force-pushed the mjmac/DAOS-7203 branch from 209de92 to c6c92f2 Compare April 23, 2021 16:13
Copy link
Collaborator

@daosbuild1 daosbuild1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. No errors found by checkpatch.

Copy link

Bug-tracker data:
Ticket title is 'Create standard DAOS dashboards for Grafana'
Status is 'Resolved'
Labels: 'rel20_TB3,rel20_TB4'
https://daosio.atlassian.net/browse/DAOS-7203

1 similar comment
Copy link

Bug-tracker data:
Ticket title is 'Create standard DAOS dashboards for Grafana'
Status is 'Resolved'
Labels: 'rel20_TB3,rel20_TB4'
https://daosio.atlassian.net/browse/DAOS-7203

@mjmac
Copy link
Contributor Author

mjmac commented Feb 17, 2024

Screenshot 2024-02-16 at 20 33 45

Copy link

github-actions bot commented Feb 17, 2024

Functional on EL 9 Test Results (old)

135 tests  ±0   130 ✅  - 1   1h 24m 30s ⏱️ - 7m 9s
 41 suites ±0     4 💤 ±0 
 41 files   ±0     0 ❌ ±0   1 🔥 +1 

For more details on these errors, see this check.

Results for commit 427bb06. ± Comparison against base commit 9894532.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Feb 17, 2024

Functional on EL 8.8 Test Results (old)

135 tests  ±0   130 ✅  - 1   1h 34m 45s ⏱️ + 5m 40s
 41 suites ±0     4 💤 ±0 
 41 files   ±0     0 ❌ ±0   1 🔥 +1 

For more details on these errors, see this check.

Results for commit 427bb06. ± Comparison against base commit 9894532.

♻️ This comment has been updated with latest results.

Copy link

Bug-tracker data:
Ticket title is 'Create standard DAOS dashboards for Grafana'
Status is 'Resolved'
Labels: 'rel20_TB3,rel20_TB4'
https://daosio.atlassian.net/browse/DAOS-7203

@mjmac mjmac changed the base branch from master to mjmac/agent_prom February 20, 2024 23:20
Copy link

Bug-tracker data:
Ticket title is 'Create standard DAOS dashboards for Grafana'
Status is 'Resolved'
Labels: 'rel20_TB3,rel20_TB4'
https://daosio.atlassian.net/browse/DAOS-7203

Copy link

github-actions bot commented Feb 21, 2024

Functional Hardware Large Test Results (old)

64 tests  ±0   64 ✅ ±0   28m 16s ⏱️ -26s
14 suites ±0    0 💤 ±0 
14 files   ±0    0 ❌ ±0 

Results for commit 427bb06. ± Comparison against base commit 9894532.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Feb 21, 2024

Functional Hardware Medium Test Results (old)

130 tests  ±0   103 ✅  - 1   2h 12m 10s ⏱️ + 2m 18s
 34 suites ±0    26 💤 ±0 
 34 files   ±0     1 ❌ +1 

For more details on these failures, see this check.

Results for commit 427bb06. ± Comparison against base commit 9894532.

♻️ This comment has been updated with latest results.

Copy link

github-actions bot commented Feb 21, 2024

Functional Hardware Medium Verbs Provider Test Results (old)

55 tests  ±0   54 ✅ ±0   4h 7m 30s ⏱️ -1s
 7 suites ±0    1 💤 ±0 
 7 files   ±0    0 ❌ ±0 

Results for commit 427bb06. ± Comparison against base commit 9894532.

♻️ This comment has been updated with latest results.

Copy link

Bug-tracker data:
Ticket title is 'Create standard DAOS dashboards for Grafana'
Status is 'Resolved'
Labels: 'rel20_TB3,rel20_TB4'
https://daosio.atlassian.net/browse/DAOS-7203

Copy link

Bug-tracker data:
Ticket title is 'Create standard DAOS dashboards for Grafana'
Status is 'Resolved'
Labels: 'rel20_TB3,rel20_TB4'
https://daosio.atlassian.net/browse/DAOS-7203

@mjmac mjmac force-pushed the mjmac/DAOS-7203 branch from 427bb06 to b74fdf0 Compare May 3, 2024 15:53
@mjmac mjmac changed the base branch from mjmac/agent_prom to master May 3, 2024 15:53
Copy link

github-actions bot commented May 3, 2024

Bug-tracker data:
Ticket title is 'Create standard DAOS dashboards for Grafana'
Status is 'Resolved'
Labels: 'rel20_TB3,rel20_TB4'
https://daosio.atlassian.net/browse/DAOS-7203

Copy link

github-actions bot commented May 3, 2024

Ticket title is 'Create standard DAOS dashboards for Grafana'
Status is 'Resolved'
Labels: 'rel20_TB3,rel20_TB4'
https://daosio.atlassian.net/browse/DAOS-7203

@daosbuild1
Copy link
Collaborator

Test stage Functional on EL 8.8 completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-5382/11/execution/node/1177/log

@daltonbohning
Copy link
Contributor

FYI - I think the dkey_akey_enum_punch.py failures are being fixed in #14301

@daltonbohning
Copy link
Contributor

FYI - I think the dkey_akey_enum_punch.py failures are being fixed in #14301

Actually, sorry. That looks like a different issue

@mjmac mjmac force-pushed the mjmac/DAOS-7203 branch 2 times, most recently from 5fdafd4 to 5ac09d7 Compare October 31, 2024 19:55
Update the Prometheus exporter to support passthrough histograms
from native DAOS telemetry format. Fixes a few bugs and inefficiencies
in the native histogram implementation.

Features: telemetry
Required-githooks: true
Change-Id: I7842cc48a107ec0ba0ec93472fb6684db7394d30
Signed-off-by: Michael MacDonald <mjmac@google.com>
@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-5382/14/testReport/

@mjmac mjmac marked this pull request as ready for review November 2, 2024 17:22
@mjmac mjmac requested review from a team as code owners November 2, 2024 17:22
@mjmac
Copy link
Contributor Author

mjmac commented Nov 2, 2024

Note to reviewers... This is finally in a state where I think it's ready for landing. I reworked the patch a bit. The original version changed the fetch/update bytes counters to histograms in addition to adding all of the logic for exporting via Prometheus. I've reverted that part of it, which should make this patch less risky. Longer term, I would like to revisit that other change, but I'll do that separately so that we can evaluate it in isolation.

Copy link
Contributor

@kjacque kjacque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. I just had a couple minor comments, which I wouldn't block on.

@@ -3200,6 +3226,8 @@ d_tm_init_histogram(struct d_tm_node_t *node, char *path, int num_buckets,
rc = -DER_NO_SHMEM;
goto failure;
}
histogram->dth_fast_bucketing =
(multiplier == 2 && (initial_width & (initial_width - 1)) == 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to describe this special case in a comment. Is this saying that the initial width is also a power of two?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to me it reads "set dth_fast_bucketing iff mult is 2 and least-significant-bit of initial_width is set, is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a comment in a subsequent patch. The idea is easier to see with an example. Say initial_width is 256. As a bit pattern, that value looks like this: 0b100000000. Then, if we subtract 1 from it, we get this pattern: 0b11111111. If we AND those together, we get 0. This only works for numbers that are clean powers of 2.

Even easier to see if we "stack" the two values:

100000000
 11111111

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you do add the comment, it would be great to detail not only the meaning of the bitwise math (although maybe that would be most clarified by using a named helper function or macro), but also why fast bucketing only works in the case where everything is a power of two. I think detailing the principle behind it will be most helpful in the future.

Comment on lines +142 to +143
if err != nil {
log.Errorf("[%s]: failed to get histogram buckets", baseName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be useful to log the actual error here too?

Comment on lines +502 to +506
cmpOpts := cmp.Options{
cmpopts.SortSlices(func(a, b string) bool {
return a < b
}),
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! IMO we should be using this in all our unit tests instead of always sorting the values we return in production code. In most cases the order really doesn't matter. To that end, I think it would be nice to add some kind of utility function in the test package for this cmp option.

Copy link
Contributor

@tanabarr tanabarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! nothing blocking

@@ -25,8 +25,11 @@ import (
"time"
)

var _ StatsMetric = (*Duration)(nil)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing description comment for exported type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not an exported type, it's an assertion that Duration implements StatsMetric.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment is referring to "Duration" type below

Values []uint64
}

// Histogram provides access to histogram data associated with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unnecessary line break

@@ -3200,6 +3226,8 @@ d_tm_init_histogram(struct d_tm_node_t *node, char *path, int num_buckets,
rc = -DER_NO_SHMEM;
goto failure;
}
histogram->dth_fast_bucketing =
(multiplier == 2 && (initial_width & (initial_width - 1)) == 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to me it reads "set dth_fast_bucketing iff mult is 2 and least-significant-bit of initial_width is set, is that correct?

@mjmac mjmac merged commit 248bb75 into master Nov 9, 2024
55 checks passed
@mjmac mjmac deleted the mjmac/DAOS-7203 branch November 9, 2024 15:29
mjmac added a commit that referenced this pull request Nov 9, 2024
Update the Prometheus exporter to support passthrough histograms
from native DAOS telemetry format. Fixes a few bugs and inefficiencies
in the native histogram implementation.

Signed-off-by: Michael MacDonald <mjmac@google.com>
jolivier23 pushed a commit that referenced this pull request Nov 12, 2024
… (#15484)

Update the Prometheus exporter to support passthrough histograms
from native DAOS telemetry format. Fixes a few bugs and inefficiencies
in the native histogram implementation.

Signed-off-by: Michael MacDonald <mjmac@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants