Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Unify graphics SMI implementations #9815

Closed
wants to merge 13 commits into from

Conversation

srebhan
Copy link
Member

@srebhan srebhan commented Sep 24, 2021

Required for all PRs:

  • Updated associated README.md.
  • Wrote appropriate unit tests.
  • Pull request title or commits are in conventional commit format (e.g. feat: or fix:)

Both SMI implementations, nvidia_smi as well as amd_rocm_smi, share a common code structure: Try first execute a binary that produces the data and then parse the data into metrics using a kind of mapping definition. The present PR unifies this common structure and merges both plugins into a new generic ReceiveAndParse plugin. This new plugin provides a common infrastructure allowing to split the processing into a transport which receives the raw data and a parser transforming this raw data to metrics. Additional post-processing capabilities are provided to allow further metric transformation impossible in the parser. The whole change by this PR is transparent to the user.

To generalize more, a common/transport package is introduced that abstracts the way of receiving the data. For now only an Exec receiver is provided sufficient to unify the two graphics plugins. However, in the future we might want to extend this to e.g. HTTP transports or similar.
For the parsing part, we uses telegraf's plugin power to instantiate a suitable parser. We configure this parser with a config file embedded into the telegraf binary.

Please note that more such common structures exist in telegraf e.g. for web-plugins that do a HTTP request and then parse the response (usually JSON) to metrics. Those plugins might also be unified in a similar way.

@telegraf-tiger telegraf-tiger bot added fix pr to fix corresponding bug new plugin plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins labels Sep 24, 2021
@srebhan srebhan requested a review from reimda September 25, 2021 15:02
@srebhan srebhan marked this pull request as ready for review September 25, 2021 15:02
@reimda
Copy link
Contributor

reimda commented Sep 29, 2021

Hi Sven, I like how this PR lets the two plugins become configuration instead of code. (in NewAMDSMI() and NewNvidiaSMI())

What do you think about making the graphics_smi class more generic so it can be used by plugins that aren't gpu related, and then putting it in plugins/common? Then adm_rocm_smi and nvidia_smi would stay in their separate directories but use the common code.

I would prefer to keep telegraf's convention of each input plugin having its own directory in plugins/inputs, where users can find the readme.md, the plugin implementation and unit tests, and shared code in plugins/common.

@helenosheaa helenosheaa added feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin and removed fix pr to fix corresponding bug new plugin plugin/input 1. Request for new input plugins 2. Issues/PRs that are related to input plugins labels Oct 4, 2021
@Hipska
Copy link
Contributor

Hipska commented Oct 11, 2021

List of plugins that might also benefit from such makeover.

plugins/common/receive_parse/receive_parse.go Outdated Show resolved Hide resolved
Comment on lines +26 to -48
if err := toml.Unmarshal(cfgfile, &cfg); err != nil {
panic(fmt.Errorf("cannot unmarshal 'amd_rocm_smi_parser.conf': %v", err))
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the benefit to putting the config in a .toml file and calling Unmarshall at runtime. We don't do that for other configuration data in the codebase.

Would it be ok with you to move the toml data directly into go struct literals here in the go source code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I can do this by just reverting the last few (3 IIRC) commits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do revert those commits. It's clearer to put them in static structs in the code, and has the benefit of failing at compile time instead of runtime if there's a typo.

Comment on lines +14 to +18
type Receiver interface {
// Receive data from the endpoint(s)
Receive() ([]byte, error)
Transport
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this fit better in package receive_parse? Referring to it as transport.Receiver was confusing to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I was thinking of also adding other transports (http etc.) here for covering more items and maybe also add Sender later to cover output plugins.
So my view was that it is a property of a transport to be able to Receive (and later to also Send)... But if you prefer to have this interface definition in the generic plugin, I can do it. What is your decision? :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems better in receive_parse to me but I'm ok with it either place.

Co-authored-by: reimda <reimda@users.noreply.github.com>
@Hipska
Copy link
Contributor

Hipska commented Sep 7, 2022

Can we finalise this one?

@reimda
Copy link
Contributor

reimda commented Sep 8, 2022

Can we finalise this one?

I'm happy to merge once we revert the toml.Unmarshal changes and switch back to static struct initialization.

@Hipska
Copy link
Contributor

Hipska commented Nov 25, 2022

@srebhan Do you know it that has been done already?

@powersj powersj self-assigned this Dec 12, 2022
@powersj powersj added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Dec 12, 2022
@powersj powersj removed the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Jan 25, 2023
@powersj
Copy link
Contributor

powersj commented Feb 15, 2023

next steps: @srebhan to look into reverting toml.Unmarshal changes and switch back to static struct initialization

@powersj powersj assigned srebhan and unassigned Hipska and powersj Feb 15, 2023
@srebhan
Copy link
Member Author

srebhan commented Mar 8, 2023

This only pays off if we convert more plugins to the infrastructure introduced in this PR. As time currently does not permit this, I will close the PR for now. If anyone is willing to work on this (and convert more plugins) please feel free to reopen or ping me.

@srebhan srebhan closed this Mar 8, 2023
@Hipska
Copy link
Contributor

Hipska commented Mar 9, 2023

I would be able to migrate some if the framework is already there.. Now there is nothing..

@srebhan
Copy link
Member Author

srebhan commented Mar 9, 2023

@Hipska as I wrote myself does not have time to convert them, but I can finish up this PR if you commit to convert others. ;-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants