Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cpufreq plugin #4215

Closed
wants to merge 6 commits into from
Closed

Cpufreq plugin #4215

wants to merge 6 commits into from

Conversation

R4scal
Copy link
Contributor

@R4scal R4scal commented May 31, 2018

Hi.

This plugin work on Linux systems and collect information about cpu frequency and throttles

Required for all PRs:

  • Signed CLA.
  • Associated README.md updated.
  • Has appropriate unit tests.

Copy link
Contributor

@danielnelson danielnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @R4scal. I haven't looked through the code closely yet but have a few ideas based on the README.

# throttles_per_socket = false
## Gather CPU throttles per physical core
## Defaults:
# throttles_per_core = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we won't need these two option to enable/disable per core and socket, let's just have them be always on?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with a single option. But if we default to on, the code should be adjusted to not throw an error if the core_throttle_count file is missing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think the performance difference would be noticeable if we removed all of the options? The measurement filtering options can be used to trim out any unwanted data so the main argument in favor would be to reduce the cost of reading the files.

## Path for sysfs filesystem.
## See https://www.kernel.org/doc/Documentation/filesystems/sysfs.txt
## Defaults:
# path_sysfs = "/sys"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call this host_sys and add support for HOST_SYS environment variable which is used only if the option is unset. This is similar to the support in gopsutil.

> cpufreq,cpu=5,host=server01 cur_freq=3801758000,max_freq=3900000000,min_freq=800000000 1527789803000000000
> cpufreq,cpu=6,host=server01 cur_freq=3839194000,max_freq=3900000000,min_freq=800000000 1527789803000000000
> cpufreq,cpu=7,host=server01 cur_freq=3877989000,max_freq=3900000000,min_freq=800000000 1527789803000000000
> cpufreq_cpu_throttles,cpu=0,host=server01 count=0 1527789803000000000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this series with cpufreq since it has the same tagset and represents the same object.

> cpufreq_core_throttles,core=0,cpu=0,host=server01 count=0 1527789803000000000
> cpufreq_core_throttles,core=1,cpu=0,host=server01 count=0 1527789803000000000
> cpufreq_core_throttles,core=2,cpu=0,host=server01 count=0 1527789803000000000
> cpufreq_core_throttles,core=3,cpu=0,host=server01 count=0 1527789803000000000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call this measurement cpufreq_core and the field can be just throttles. This will allow us to add more per core metrics later on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're contradicting yourself. Your comment right above says to merge cpufreq_core_throttles with cpufreq. But this comment says to just rename cpufreq_core_throttles to cpufreq_core :-)

I'm in favor of merging both into the cpufreq measurement. No need for 2 measurements.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking there would be two measurements: cpufreq and cpufreq_core. cpufreq_core would have a core and cpu tag while cpufreq would only have cpu.

If we have this, then it is hard to select just the core throttling because you have to exclude the item without the tag:

cpufreq,cpu=0 throttle=42i,cur_freq=3756293000i
cpufreq,cpu=0,core=0 throttle=42i

This could maybe work, but not sure if I like the naming as much:

cpufreq,cpu=0 throttle=42i,cur_freq=3756293000i
cpufreq,cpu=0,core=0 core_throttle=42i

And here is an example of what I was thinking originally:

cpufreq,cpu=0 throttle=42i,cur_freq=3756293000i
cpufreq_core,cpu=0,core=0 throttle=42i

@danielnelson
Copy link
Contributor

@phemmer @randallt Could you review if this would meet your requirements?

acc.AddError(err)
continue
} else {
fileds["cur_freq"] = float64(value) * 1000.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these stored as float64, and not int?

# throttles_per_socket = false
## Gather CPU throttles per physical core
## Defaults:
# throttles_per_core = false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree with a single option. But if we default to on, the code should be adjusted to not throw an error if the core_throttle_count file is missing.

> cpufreq_core_throttles,core=0,cpu=0,host=server01 count=0 1527789803000000000
> cpufreq_core_throttles,core=1,cpu=0,host=server01 count=0 1527789803000000000
> cpufreq_core_throttles,core=2,cpu=0,host=server01 count=0 1527789803000000000
> cpufreq_core_throttles,core=3,cpu=0,host=server01 count=0 1527789803000000000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're contradicting yourself. Your comment right above says to merge cpufreq_core_throttles with cpufreq. But this comment says to just rename cpufreq_core_throttles to cpufreq_core :-)

I'm in favor of merging both into the cpufreq measurement. No need for 2 measurements.

acc.AddFields(
"cpufreq_cpu_throttles",
map[string]interface{}{
"count": float64(packageThrottleCount),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeat of the float64 vs int question. Why?

@phemmer
Copy link
Contributor

phemmer commented Jun 13, 2018

Thoughts on including cpuinfo_min_freq and cpuinfo_max_freq as well? Can be useful, especially when scaling_(min|max)_freq are not the same, and/or being dynamically adjusted.

@Xyaren
Copy link

Xyaren commented Mar 31, 2019

Is there anything new on here ?

@glinton
Copy link
Contributor

glinton commented Apr 2, 2019

I believe we're still waiting for @R4scal to address the feedback

@broferek
Copy link

Is there anything new ? This would be a really useful plugin if available

@glinton
Copy link
Contributor

glinton commented Aug 29, 2019

If someone wants to pick this up where R4scal left off, let me know and I'll "assign" you to this so the work isn't duplicated.

@R4scal
Copy link
Contributor Author

R4scal commented Oct 26, 2019

Hi.
Sorry for missing so long. I performed a rebase from the current master and corrected code according to the comments in the reviews.

@R4scal
Copy link
Contributor Author

R4scal commented Nov 5, 2019

@danielnelson @phemmer ping

@R4scal
Copy link
Contributor Author

R4scal commented Nov 14, 2019

@danielnelson @phemmer @glinton ping

@danielnelson danielnelson self-requested a review November 14, 2019 20:39
@alatteri
Copy link

alatteri commented Dec 13, 2019

@danielnelson Would love to see this pulled into master.

@vazhem
Copy link

vazhem commented Jun 3, 2020

Any plans to have this merged?
It is important to have these metrics to alert on bad nodes if current frequency per cpu is lower then node's minimum.

@eshelman
Copy link

Adding another voice of support for this plugin. Hoping it can be merged?

In my field (High Performance Computing), we run the CPUs at 100% utilization most of the time and it's very desirable to measure operating frequencies.

@jose-d
Copy link
Contributor

jose-d commented Jul 17, 2020

@eshelman

.. In my field (High Performance Computing) ..

ya, another HPC guy here.
Before $this will be merged you can go with mine exec collector for cpufreqs: https://github.com/jose-d/telegraf-collectors

## Defaults:
# gather_throttles = false
```

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a Metric section describing the fields similar to #8342!?

Comment on lines +21 to +28
> cpufreq,cpu=0,host=server01 cur_freq=3756293000i,max_freq=3900000000i,min_freq=800000000i,throttle_count=0i 1527789803000000000
> cpufreq,cpu=1,host=server01 cur_freq=3735119000i,max_freq=3900000000i,min_freq=800000000i,throttle_count=0i 1527789803000000000
> cpufreq,cpu=2,host=server01 cur_freq=3786381000i,max_freq=3900000000i,min_freq=800000000i,throttle_count=0i 1527789803000000000
> cpufreq,cpu=3,host=server01 cur_freq=3823190000i,max_freq=3900000000i,min_freq=800000000i,throttle_count=0i 1527789803000000000
> cpufreq,cpu=4,host=server01 cur_freq=3780804000i,max_freq=3900000000i,min_freq=800000000i,throttle_count=0i 1527789803000000000
> cpufreq,cpu=5,host=server01 cur_freq=3801758000i,max_freq=3900000000i,min_freq=800000000i,throttle_count=0i 1527789803000000000
> cpufreq,cpu=6,host=server01 cur_freq=3839194000i,max_freq=3900000000i,min_freq=800000000i,throttle_count=0i 1527789803000000000
> cpufreq,cpu=7,host=server01 cur_freq=3877989000i,max_freq=3900000000i,min_freq=800000000i,throttle_count=0i 1527789803000000000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about renaming cur_freq, max_freq, min_freq to current, max, min? The freq_ part seems superfluous given that the metric is named cpufreq.

@@ -0,0 +1,116 @@
package cpufreq
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this only works on linux, please make sure it only builds on linux.

Suggested change
package cpufreq
// +build linux
package cpufreq

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@R4scal
Moreover, you will need to add cpufreq_notlinux.go file with following content:

// +build !linux

package cpufreq

To build (but not include in binary) your plugin succesfully for non-Linux systems.

Look at systtat plugin: https://github.com/influxdata/telegraf/tree/f7d94430d29d15c42d8d186c99bfd065b9a21fa0/plugins/inputs/sysstat

}

func (g *CPUFreq) Description() string {
return "Read specific statistics per cgroup"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably

Suggested change
return "Read specific statistics per cgroup"
return "Collect current CPU's frequencies"

would be better, wouldn't it?

Comment on lines +42 to +48
if g.PathSysfs == "" {
if os.Getenv("HOST_SYS") != "" {
g.PathSysfs = os.Getenv("HOST_SYS")
} else {
g.PathSysfs = defaultHostSys
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to an Init() function that is called once when the plugin is constructed. This will avoid doing it over and over again.


cpus, err := filepath.Glob(path.Join(g.PathSysfs, "devices/system/cpu/cpu[0-9]*"))
if err != nil {
acc.AddError(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filepath.Glob will only return an error for invalid patterns according to its documentation. Therefore, and because you won't have cpus anyway, we should simply return err here.

Comment on lines +68 to +69
acc.AddError(err)
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put a `continue here and remove the else saving one indentation level.

if value, err = readUintFromFile(filepath.Join(cpu, "cpufreq", "scaling_cur_freq")); err != nil {
acc.AddError(err)
continue
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for an else here.

}
if value, err = readUintFromFile(filepath.Join(cpu, "cpufreq", "scaling_min_freq")); err != nil {
acc.AddError(err)
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use continue as with the previous condition. Maybe it would be nice to avoid the code duplication at all and construct a for loop running over {"min", "max", "cur"}.

acc.AddError(err)
continue
} else {
fileds["cur_freq"] = uint64(value) * 1000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo fields instead of fileds.

Comment on lines +43 to +44
if os.Getenv("HOST_SYS") != "" {
g.PathSysfs = os.Getenv("HOST_SYS")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is HOST_SYS a standard environment variable? I didn't find any documentation about it. Can you point me to a reference?
If it is not, remove this section as an alternative prefix can still be specified in the config.

Copy link
Member

@srebhan srebhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some minor comments. Nothing too big. @R4scal could you please take a look!?

@srebhan srebhan self-assigned this Dec 3, 2020
@sjwang90 sjwang90 added this to the Planned milestone Dec 9, 2020
@srebhan
Copy link
Member

srebhan commented Dec 18, 2020

@R4scal any chance you can work on this to resolve the comments?

@R4scal
Copy link
Contributor Author

R4scal commented Jan 1, 2021

Hi.
During the 2 years that have passed since the opening of this MR, I was able to completely stop using anything from TICK and I am happy.
Happy New Year!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.