-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coalesce some statistics into a single metric #87
Conversation
I fully expect this PR to need some major cleanup before merging. It's ugly and hacky, but it does work. So, thoughts on the general concept and the result that it comes up with would be appreciated. |
febedf7
to
ae91b2a
Compare
I don't see anything wrong or bad about this PR, though I think we can certainly make it more efficient. As a quick brain exercise, I created this semi-pseudo function using a lot of your code in the func getJobStatsOperation(job string, jobID string) (metricList []lustreJobsMetric, err error) {
opList := []string{"open", "close", "mknod", "link", "unlink", "mkdir", "rmdir", "rename", "getattr", "setattr", "getxattr", "setxattr", "statfs", "sync", "samedir_rename", "crossdir_rename", "punch", "destroy", "create", "get_info", "set_info", "quotactl"}
for operation := range opList {
opStat := regexCaptureString(operation+": .*", jobBlock)
if len(opStat) < 1 {
continue
}
opNumbers := regexCaptureStrings("[0-9]*.[0-9]+|[0-9]+", opStat)
if len(opNumbers) < 1 {
continue
}
result, err := strconv.ParseUint(strings.TrimSpace(opNumbers[0]), 10, 64)
if err != nil {
return nil, err
}
extraLabelValue = operation
helpText = jobStatsHelp
l := lustreStatsMetric{
title: promName,
help: helpText,
value: result,
extraLabel: "operation",
extraLabelValue: extraLabelValue,
}
metricList = append(metricList, lustreJobsMetric{jobID, l})
}
return metricList, err
} With this, we can replace func parseJobStatsText(jobStats string, promName string, helpText string) (metricList []lustreJobsMetric, err error) {
jobs := regexCaptureStrings("(?ms:job_id:.*?(-|\\z))", jobStats)
if len(jobs) < 1 {
return nil, nil
}
for _, job := range jobs {
jobID, err := getJobNum(job)
if err != nil {
return nil, err
}
if helpText == jobStatsHelp {
jobList, err := getJobStatsOperation(job, jobID)
if err != nil {
return nil, err
}
} else {
jobList, err := getJobStatsIO(job, jobID, promName, helpText) // <- Note, the original getJobStatsByOperation will be renamed to getJobStatsIO
if err != nil {
return nil, err
}
}
for _, item := range jobList {
metricList = append(metricList, item)
}
}
return metricList, nil
} The same principles would apply for the For performance, we can just parse the file once to get all metrics instead of once per metric. As in, if we have a file with 10 operations in it, with this method, we can just parse it once and move on instead of parsing it 10 times. This probably makes more sense with the metrics being bundled together anyway. This will also help reduce redundancies by removing all of the help and type declarations for every metric. We won't need to specify every Just a few quick thoughts of mine, let me know if you have any questions or comments on this. Or, if you think we could save something like this for another PR. |
This is a WIP at the moment, unit tests are going to be broken. I'll let you know when it's ready for re-review. |
b8575e5
to
6d37ca2
Compare
@roclark When you have a chance, take a look and let me know what you think. This is fully updated and representative of what I would like to merge. Feel free to be nitpicky about names and such. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comments here - let me know if you have any questions on them!
sources/procfs.go
Outdated
} | ||
for _, operation := range operationSlice { | ||
opStat := regexCaptureString(operation.pattern+": .*", jobBlock) | ||
opNumbers := regexCaptureStrings("[0-9]*.[0-9]+|[0-9]+", opStat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you aren't using raw string literals here, don't forget the double backslash before the dot since this will match any character between two numbers (e.g. 1234a5678
will return 1234a5678
instead of 1234, 5678
). The pattern will look something like this: [0-9]*\\.[0-9]+|[0-9]+
.
As an aside, I will cleanup and move all of these number parsers in a future PR, so it isn't too big of a deal.
pingHelp: {pattern: "ping .*", index: 1}, | ||
} | ||
bytesString := regexCaptureString(bytesMap[helpText].pattern, statsFile) | ||
readSamplesHelp: {pattern: "read_bytes .*", index: 1}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per #88, the wildcard is added during initialization while the maps on lines 338, 472 and 514 add the wildcard later. Do we want to follow suit and add + " .*"
to the end of line 398?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, along the lines of #88, it might be a good idea to transition to the number regex pattern we use in all other functions as opposed to matching strings, both for consistency and safety. Though, this probably belongs in another PR which I can certainly pickup in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figure we can clean the non-converged stats up later in a separate PR.
sources/procfs.go
Outdated
{"job_stats", "job_crossdir_rename_total", crossdirRenameHelp, s.counterMetric}, | ||
{"md_stats", "stats_total", statsHelp, s.counterMetric, true}, | ||
{"num_exports", "exports_total", "Total number of times the pool has been exported", s.counterMetric, false}, | ||
{"job_stats", "job_stats_total", jobStatsHelp, s.counterMetric, false}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be I'm off here, but shouldn't the hasMultipleVals
element be true
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct. Will fix.
{"job_stats", "job_sync_total", syncHelp, s.counterMetric}, | ||
{"job_stats", "job_samedir_rename_total", samedirRenameHelp, s.counterMetric}, | ||
{"job_stats", "job_crossdir_rename_total", crossdirRenameHelp, s.counterMetric}, | ||
{"md_stats", "stats_total", statsHelp, s.counterMetric, true}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we are changing how these stats are displayed, do we want to keep this md_stats
as opposed to just stats
? The nodeType
label will be mdt
so we should be able to differentiate this from other stats
. Mostly putting this here as a mild discussion point, not necessarily because I have an opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll want to change it to stats instead of md_stats, IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, I was looking at the filename instead of the metric name. Nevermind!
sources/procfs.go
Outdated
title: promName, | ||
help: helpText, | ||
value: result, | ||
extraLabel: "operation", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, is operation
ever used or displayed anywhere? I know you decide how to handle stats metrics based on the extraLabelValue
which is empty in this case, but any reason you are setting the extraLabel
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extraLabel is what we use to figure out what the label should be called. I could hard code it I suppose, but I try to avoid that.
sources/procfs.go
Outdated
title: promName, | ||
help: helpText, | ||
value: result, | ||
extraLabel: "operation", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question here as in line 416.
6d37ca2
to
9b357c0
Compare
@roclark This is done from my perspective. Let me know if you think any of the comments you've left here are dealbreakers. |
1645c8d
to
f7d7174
Compare
Looks like we just need to remove the |
Signed-Off-By: Joe Handzik <joseph.t.handzik@hpe.com>
f7d7174
to
bd631c4
Compare
@roclark Ha, I pushed before lunch assuming I had bigger problems. Looks like it was that simple. Merge if you're ready. |
Signed-Off-By: Joe Handzik joseph.t.handzik@hpe.com
Fixes #64