Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Lustre test data and fix metrics for 2.10.1 #130

Merged
merged 8 commits into from
Dec 4, 2017

Conversation

joehandzik
Copy link
Contributor

Multiple parts here:

  1. Update all procfs files based on a running Lustre 2.10.1 cluster.
  2. Update our testing mocks to expect metrics that match the 2.10.1 data.
  3. Fix our exporter to match the 2.10.1 mocks.

The change set breaks at least ten or so metrics across the MGS, MDS, and generic metric categories. We view this as acceptable, since we have tagged 1.x versions that should work correctly with that data.

Robert Clark added 7 commits November 29, 2017 11:38
Signed-Off-By: Robert Clark <robert.d.clark@hpe.com>
Signed-Off-By: Robert Clark <robert.d.clark@hpe.com>
Signed-Off-By: Robert Clark <robert.d.clark@hpe.com>
Signed-Off-By: Robert Clark <robert.d.clark@hpe.com>
Signed-Off-By: Joe Handzik <joseph.t.handzik@hpe.com>
Copy link
Contributor

@roclark roclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I just have two minor comments, but I'm happy with this as-is. Let me know when you are ready to merge and I will do so.

@@ -16,7 +16,7 @@ GOPATH := $(firstword $(subst :, ,$(shell $(GO) env GOPATH)))
PROMU ?= $(GOPATH)/bin/promu
GOLINTER ?= $(GOPATH)/bin/gometalinter
#aligncheck and gosimple took unfortunately too long at travisCI
GOLINTER_OPT ?= --vendor --deadline 6m --cyclo-over=20 --disable=aligncheck --disable=gosimple
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my curiosity, what was the issue with gotype?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotype was catching a false positive in a variable not being defined in the sources package, which led to roughly two hours of serious golang soul searching. The code works correctly, it is a false positive as far as I can tell.

{"lustre_job_write_bytes_total", "The total number of bytes that have been written.", counter, []labelPair{{"component", "ost"}, {"jobid", "23"}, {"target", "lustrefs-OST0000"}}, 1.55500642304e+11, false},
{"lustre_job_write_bytes_total", "The total number of bytes that have been written.", counter, []labelPair{{"component", "ost"}, {"jobid", "24"}, {"target", "lustrefs-OST0000"}}, 2.15147593728e+11, false},
{"lustre_job_write_bytes_total", "The total number of bytes that have been written.", counter, []labelPair{{"component", "ost"}, {"jobid", "25"}, {"target", "lustrefs-OST0000"}}, 2.40823762944e+11, false},
{"lustre_job_write_bytes_total", "The total number of bytes that have been written.", counter, []labelPair{{"component", "ost"}, {"jobid", "26"}, {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Teeeeechnically I think this should be in a different section (//Health Metrics), but I'm definitely not offended by this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was tagged on the wrong line somehow... but I'm referring to line 1647 which I added as a standalone comment below. This can be disregarded!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignoring like a champ

@@ -1631,7 +1644,7 @@ func TestCollector(t *testing.T) {
{"lustre_rpcs_in_flight", "Current number of RPCs that are processing during the snapshot.", gauge, []labelPair{{"component", "client"}, {"operation", "write"}, {"size", "9"}, {"target", "lustrefs-OST0000-osc-ffff88105db50000"}, {"type", "osc"}}, 272560, false},

// Generic Metrics
{"lustre_health_check", "Current health status for the indicated instance: 1 refers to 'healthy', 0 refers to 'unhealthy'", gauge, []labelPair{{"component", "generic"}, {"target", "lustre"}}, 1, false},
{"lustre_health_check", "Current health status for the indicated instance: 1 refers to 'healthy', 0 refers to 'unhealthy'", gauge, []labelPair{{"component", "health"}, {"target", "lustre"}}, 1, false},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Teeeeechnically I think this should be in a different section (//Health Metrics), but I'm definitely not offended by this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move it! Thanks for catching.

@joehandzik joehandzik force-pushed the wip-update-lustre-test-data branch 2 times, most recently from a52f15e to fc57003 Compare December 4, 2017 16:15
@joehandzik
Copy link
Contributor Author

@roclark Fixed the test line and added an update to the README for the new health flag. Everything should be good to go!

@roclark roclark merged commit f657481 into master Dec 4, 2017
@joehandzik joehandzik deleted the wip-update-lustre-test-data branch December 5, 2017 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants