-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Unify graphics SMI implementations #9815
Conversation
Hi Sven, I like how this PR lets the two plugins become configuration instead of code. (in What do you think about making the graphics_smi class more generic so it can be used by plugins that aren't gpu related, and then putting it in plugins/common? Then adm_rocm_smi and nvidia_smi would stay in their separate directories but use the common code. I would prefer to keep telegraf's convention of each input plugin having its own directory in plugins/inputs, where users can find the readme.md, the plugin implementation and unit tests, and shared code in plugins/common. |
Looks like new artifacts were built from this PR. Get them here!Artifact URLs |
List of plugins that might also benefit from such makeover. |
…ralizes the usual 'get data from a data-source and parse the result' pattern.
…at is embedded into the binary.
…that is embedded into the binary.
if err := toml.Unmarshal(cfgfile, &cfg); err != nil { | ||
panic(fmt.Errorf("cannot unmarshal 'amd_rocm_smi_parser.conf': %v", err)) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the benefit to putting the config in a .toml file and calling Unmarshall at runtime. We don't do that for other configuration data in the codebase.
Would it be ok with you to move the toml data directly into go struct literals here in the go source code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I can do this by just reverting the last few (3 IIRC) commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do revert those commits. It's clearer to put them in static structs in the code, and has the benefit of failing at compile time instead of runtime if there's a typo.
type Receiver interface { | ||
// Receive data from the endpoint(s) | ||
Receive() ([]byte, error) | ||
Transport | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this fit better in package receive_parse? Referring to it as transport.Receiver was confusing to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I was thinking of also adding other transports (http
etc.) here for covering more items and maybe also add Sender
later to cover output plugins.
So my view was that it is a property of a transport to be able to Receive (and later to also Send)... But if you prefer to have this interface definition in the generic plugin, I can do it. What is your decision? :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems better in receive_parse to me but I'm ok with it either place.
Co-authored-by: reimda <reimda@users.noreply.github.com>
📦 Looks like new artifacts were built from this PR. Expand this list to get them here! 🐯Artifact URLs |
Can we finalise this one? |
I'm happy to merge once we revert the toml.Unmarshal changes and switch back to static struct initialization. |
@srebhan Do you know it that has been done already? |
next steps: @srebhan to look into reverting toml.Unmarshal changes and switch back to static struct initialization |
This only pays off if we convert more plugins to the infrastructure introduced in this PR. As time currently does not permit this, I will close the PR for now. If anyone is willing to work on this (and convert more plugins) please feel free to reopen or ping me. |
I would be able to migrate some if the framework is already there.. Now there is nothing.. |
@Hipska as I wrote myself does not have time to convert them, but I can finish up this PR if you commit to convert others. ;-) |
Required for all PRs:
Both SMI implementations,
nvidia_smi
as well asamd_rocm_smi
, share a common code structure: Try first execute a binary that produces the data and then parse the data into metrics using a kind of mapping definition. The present PR unifies this common structure and merges both plugins into a new genericReceiveAndParse
plugin. This new plugin provides a common infrastructure allowing to split the processing into atransport
which receives the raw data and aparser
transforming this raw data to metrics. Additional post-processing capabilities are provided to allow further metric transformation impossible in the parser. The whole change by this PR is transparent to the user.To generalize more, a
common/transport
package is introduced that abstracts the way of receiving the data. For now only anExec
receiver is provided sufficient to unify the two graphics plugins. However, in the future we might want to extend this to e.g. HTTP transports or similar.For the parsing part, we uses telegraf's plugin power to instantiate a suitable parser. We configure this parser with a config file embedded into the telegraf binary.
Please note that more such common structures exist in telegraf e.g. for web-plugins that do a HTTP request and then parse the response (usually JSON) to metrics. Those plugins might also be unified in a similar way.