-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adding Jobstats support to Lustre2 input plugin
Lustre Jobstats allows for RPCs to be tagged with a value, such as a job's ID. This allows for per job statistics. This plugin collects statistics and tags the data with the jobid. closes #1107
- Loading branch information
Showing
3 changed files
with
378 additions
and
7 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
a7b0861
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hanleyja I'm reviewing your code in the hopes of making a similar LustreClient plugin (read: borrowing large portions)
It might be my ineptness at GoLang, but I cannot see how in a particular jobstats file that contains multiple jobs ("/proc/fs/lustre/obdfilter/*/job_stats") how you are differentiating one jobstats data from another as it all appears to be getting lumped together and overwritten within the "fields" variable.
As well, each job_stats file that we have contains quite a few job_ids and specific snapshot_time that identify them as being potentially duplicates of the last polling period. Is it possible to handle this case within the Telegraf polling framework?
a7b0861
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm actually working on a LustreClient plugin for telegraf myself, so it might be useful to combine some effort.
You're correct - the way this currently handles jobs, only the last job entry will get picked up. I have a fix for this that I'm running with, but need to generate a pull request. Yes, it's possible to overload the timestamp by passing an additional argument to the
AddFields
call. I chose to keep the same logic as the original Lustre2 plugin and allow telegraf to determine the time stamp. This will likely cause points with the same data at multiple time steps, at least for the range betweensnapshot_time
and the configuredjob_cleanup_interval
. The benefit is that the points become regularly spaced, leading to greater compression when writing to an InfluxDB output.a7b0861
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hanleyja is there a unique tag that could be added for each job that could keep the points separated?
a7b0861
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that should be the jobid that follows
- job_id:
. The issue is that not all are currently being carried through - only the last one. I have a fix that just uses an additional map field, which captures this jobid. That will keep the points unique.