Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coalesce some statistics into a single metric #87

Merged
merged 1 commit into from
Jun 6, 2017
Merged

Conversation

joehandzik
Copy link
Contributor

Signed-Off-By: Joe Handzik joseph.t.handzik@hpe.com

Fixes #64

@joehandzik joehandzik requested a review from roclark May 30, 2017 20:34
@joehandzik
Copy link
Contributor Author

I fully expect this PR to need some major cleanup before merging. It's ugly and hacky, but it does work. So, thoughts on the general concept and the result that it comes up with would be appreciated.

@roclark
Copy link
Contributor

roclark commented May 31, 2017

I don't see anything wrong or bad about this PR, though I think we can certainly make it more efficient. As a quick brain exercise, I created this semi-pseudo function using a lot of your code in the getJobStatsByOperation function:

func getJobStatsOperation(job string, jobID string) (metricList []lustreJobsMetric, err error) {
	opList := []string{"open", "close", "mknod", "link", "unlink", "mkdir", "rmdir", "rename", "getattr", "setattr", "getxattr", "setxattr", "statfs", "sync", "samedir_rename", "crossdir_rename", "punch", "destroy", "create", "get_info", "set_info", "quotactl"}
	for operation := range opList {
		opStat := regexCaptureString(operation+": .*", jobBlock)
		if len(opStat) < 1 {
			continue
		}
		opNumbers := regexCaptureStrings("[0-9]*.[0-9]+|[0-9]+", opStat)
		if len(opNumbers) < 1 {
			continue
		}
		result, err := strconv.ParseUint(strings.TrimSpace(opNumbers[0]), 10, 64)
		if err != nil {
			return nil, err
		}
		extraLabelValue = operation
		helpText = jobStatsHelp
		l := lustreStatsMetric{
			title:           promName,
			help:            helpText,
			value:           result,
			extraLabel:      "operation",
			extraLabelValue: extraLabelValue,
		}
		metricList = append(metricList, lustreJobsMetric{jobID, l})		
	}

    return metricList, err
}

With this, we can replace parseJobStatsText with something along the lines of:

func parseJobStatsText(jobStats string, promName string, helpText string) (metricList []lustreJobsMetric, err error) {
    jobs := regexCaptureStrings("(?ms:job_id:.*?(-|\\z))", jobStats)
    if len(jobs) < 1 {
        return nil, nil
    }

    for _, job := range jobs {
        jobID, err := getJobNum(job)
        if err != nil {
            return nil, err
        }
        if helpText == jobStatsHelp {
            jobList, err := getJobStatsOperation(job, jobID)
            if err != nil {
                return nil, err
            }
        } else {
            jobList, err := getJobStatsIO(job, jobID, promName, helpText)  // <- Note, the original getJobStatsByOperation will be renamed to getJobStatsIO
            if err != nil {
                return nil, err
            }
        }
        for _, item := range jobList {
            metricList = append(metricList, item)
        }
    }
    return metricList, nil
}

The same principles would apply for the stats and md_stats files. I think this would simplify some of the logic and improve performance while reducing redundancies.

For performance, we can just parse the file once to get all metrics instead of once per metric. As in, if we have a file with 10 operations in it, with this method, we can just parse it once and move on instead of parsing it 10 times. This probably makes more sense with the metrics being bundled together anyway.

This will also help reduce redundancies by removing all of the help and type declarations for every metric. We won't need to specify every helpText for the operations, especially since we aren't even using that text anymore.

Just a few quick thoughts of mine, let me know if you have any questions or comments on this. Or, if you think we could save something like this for another PR.

@joehandzik
Copy link
Contributor Author

This is a WIP at the moment, unit tests are going to be broken. I'll let you know when it's ready for re-review.

@joehandzik joehandzik force-pushed the wip-fix-issue-64 branch 6 times, most recently from b8575e5 to 6d37ca2 Compare June 5, 2017 21:08
@joehandzik
Copy link
Contributor Author

@roclark When you have a chance, take a look and let me know what you think. This is fully updated and representative of what I would like to merge. Feel free to be nitpicky about names and such.

Copy link
Contributor

@roclark roclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments here - let me know if you have any questions on them!

}
for _, operation := range operationSlice {
opStat := regexCaptureString(operation.pattern+": .*", jobBlock)
opNumbers := regexCaptureStrings("[0-9]*.[0-9]+|[0-9]+", opStat)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you aren't using raw string literals here, don't forget the double backslash before the dot since this will match any character between two numbers (e.g. 1234a5678 will return 1234a5678 instead of 1234, 5678). The pattern will look something like this: [0-9]*\\.[0-9]+|[0-9]+.

As an aside, I will cleanup and move all of these number parsers in a future PR, so it isn't too big of a deal.

pingHelp: {pattern: "ping .*", index: 1},
}
bytesString := regexCaptureString(bytesMap[helpText].pattern, statsFile)
readSamplesHelp: {pattern: "read_bytes .*", index: 1},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per #88, the wildcard is added during initialization while the maps on lines 338, 472 and 514 add the wildcard later. Do we want to follow suit and add + " .*" to the end of line 398?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, along the lines of #88, it might be a good idea to transition to the number regex pattern we use in all other functions as opposed to matching strings, both for consistency and safety. Though, this probably belongs in another PR which I can certainly pickup in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figure we can clean the non-converged stats up later in a separate PR.

{"job_stats", "job_crossdir_rename_total", crossdirRenameHelp, s.counterMetric},
{"md_stats", "stats_total", statsHelp, s.counterMetric, true},
{"num_exports", "exports_total", "Total number of times the pool has been exported", s.counterMetric, false},
{"job_stats", "job_stats_total", jobStatsHelp, s.counterMetric, false},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be I'm off here, but shouldn't the hasMultipleVals element be true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. Will fix.

{"job_stats", "job_sync_total", syncHelp, s.counterMetric},
{"job_stats", "job_samedir_rename_total", samedirRenameHelp, s.counterMetric},
{"job_stats", "job_crossdir_rename_total", crossdirRenameHelp, s.counterMetric},
{"md_stats", "stats_total", statsHelp, s.counterMetric, true},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we are changing how these stats are displayed, do we want to keep this md_stats as opposed to just stats? The nodeType label will be mdt so we should be able to differentiate this from other stats. Mostly putting this here as a mild discussion point, not necessarily because I have an opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll want to change it to stats instead of md_stats, IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, I was looking at the filename instead of the metric name. Nevermind!

title: promName,
help: helpText,
value: result,
extraLabel: "operation",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, is operation ever used or displayed anywhere? I know you decide how to handle stats metrics based on the extraLabelValue which is empty in this case, but any reason you are setting the extraLabel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extraLabel is what we use to figure out what the label should be called. I could hard code it I suppose, but I try to avoid that.

title: promName,
help: helpText,
value: result,
extraLabel: "operation",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here as in line 416.

@joehandzik
Copy link
Contributor Author

@roclark This is done from my perspective. Let me know if you think any of the comments you've left here are dealbreakers.

@joehandzik joehandzik force-pushed the wip-fix-issue-64 branch 2 times, most recently from 1645c8d to f7d7174 Compare June 6, 2017 17:04
@roclark
Copy link
Contributor

roclark commented Jun 6, 2017

Looks like we just need to remove the operation label in the stats_integration_test.go file, then everything should work fine. Once done, you good to merge?

Signed-Off-By: Joe Handzik <joseph.t.handzik@hpe.com>
@joehandzik
Copy link
Contributor Author

@roclark Ha, I pushed before lunch assuming I had bigger problems. Looks like it was that simple. Merge if you're ready.

@roclark roclark merged commit 6863c97 into master Jun 6, 2017
@joehandzik joehandzik deleted the wip-fix-issue-64 branch June 6, 2017 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants