Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency of representation diskio from windows and linux #3798

Closed
Elufimov opened this issue Mar 23, 2017 · 22 comments · Fixed by #11635
Closed

Inconsistency of representation diskio from windows and linux #3798

Elufimov opened this issue Mar 23, 2017 · 22 comments · Fixed by #11635
Assignees
Labels
bug Metricbeat Metricbeat module Team:Integrations Label for the Integrations team :Windows

Comments

@Elufimov
Copy link
Contributor

Elufimov commented Mar 23, 2017

For confirmed bugs, please report:

  • Version: metricbeat 5.2.2
  • Operating System: windows server 2012 r2, ubuntu server 16.04

There is inconsistency between diskio Fields from linux and from windows machines. From linux hosts we get cumulative value, however from windows hosts we get "normal" value. It applies to

system.diskio.read.count
system.diskio.write.count
system.diskio.read.bytes
system.diskio.write.bytes
system.diskio.read.time
system.diskio.write.time
system.diskio.io.time
@tsg tsg added bug Metricbeat Metricbeat labels Mar 23, 2017
@tsg
Copy link
Contributor

tsg commented Mar 23, 2017

Pinging @andrewkroh.

@tsg tsg added the :Windows label Mar 23, 2017
@andrewkroh
Copy link
Member

Yeah, it does look inconsistent.

On Windows the third party lib is using this WMI query to get the data: SELECT * FROM Win32_PerfFormattedData_PerfDisk_LogicalDisk

Then is just writes the averaged values into the fields that are supposed to be counters. https://github.com/shirou/gopsutil/blob/v2.0.0/disk/disk_windows.go#L144-L152

From Windows docs:

Classes derived from Win32_PerfRawData contain raw, or "uncooked" performance data, and are supported by the Performance Counter provider. In contrast, classes derived from Win32_PerfFormattedData contain "cooked", or formatted data...

Maybe we can find a table that has raw values. Anyone want to research this?

@Elufimov
Copy link
Contributor Author

I also think these fields do not work right. I did small experiment, copied files to windows machine and this action was not reflected in elastic index.

@andrewkroh
Copy link
Member

The averaging from Windows is basically acting as a low-pass filter so those values are not very useful because you won't be able to observe high frequency signals in the data. Getting at the raw data (with relatively high sampling period) would solve this.

@Elufimov
Copy link
Contributor Author

Can I hope to see fix of this? We have mixed infrastructure (linux + windows) and this bug preventing us from bits adoption.

@martinscholz83
Copy link
Contributor

Hi @andrewkroh. Now our perfmon metricset is merged what do you think of to use this as an internal API to collect these data. The next feature I would add to metricset is to get raw data values.

@andrewkroh
Copy link
Member

@maddin2016 If it's possible to retrieve the raw values through perfmon then I think that would be a great solution to this issue.

@martinscholz83
Copy link
Contributor

@andrewkroh, can you please add this to #3828 so we don't miss this?

@ruflin
Copy link
Contributor

ruflin commented Mar 30, 2017

I would not add it into #3828 as it is not really a cleanup to perfmon metricset but enhancing a feature of an other metricset. I think it is perfectly tracked here as this is the issue that will be closed, in case there will be a PR for it.

@andrewkroh
Copy link
Member

FYI: Another report for diskio problems on Windows due to the usage of WMI. https://discuss.elastic.co/t/system-diskio-io-time-0-on-windows/85508

@martinscholz83
Copy link
Contributor

What do you think of adding a helper which collect these counters through the perfmon api? Or is it possible to add a diskio_windows.go file?

@andrewkroh
Copy link
Member

It should be possible to have a separate implementation for windows by using build tags (i.e. diskio_windows.go).

@Elufimov
Copy link
Contributor Author

Is this bug was fixed in 6.0?

@martinscholz83
Copy link
Contributor

martinscholz83 commented Nov 20, 2017

@andrewkroh, i'm currenty implement a solution to retrieve the raw values with perfmon. Are there a any example how the linux and windows outputs differ?

@andrewkroh
Copy link
Member

There is a sample document in the diskio docs: https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-metricset-system-diskio.html#_fields_57

I don't have a sample from Windows, but the main difference that the values coming from WMI are averages rather than cumulative. On *nix we get an ever increasing counter (until rollover) that contains the total number for all time. So the differences is basically that on *nix we have counters and on Windows we have gauges (counter vs gauges).

@martinscholz83
Copy link
Contributor

@andrewkroh, i have started to implement diskio_windows to collect these counters with perfmon. See here. But I'm afraid that this can not solve the main problem between counters and gauges. With performance counters mostly you only get real time values. There are some counters which stores values since the system is booted but not for logical disk. So if we want cumulative values perfmon is not the right tool. I think we should use the DeviceIoControl function. Together with IOCTL_DISK_PERFORMANCE control code which gives us real counts for a disk

@andrewkroh
Copy link
Member

Good find. It does sounds like the DISK_PERFORMANCE data would be good for this.

@martinscholz83
Copy link
Contributor

Two questions.

  1. Should i keep the new created function PdhGetRawCounterArray?

  2. syscall has the functions DeviceIoControl and CreateFile. To get the local drive names we need a function like GetLogicalDriveStrings. This function is not implemented in syscall. So question is to implement all functions we need on our own or to use existing functions from sycall?

@andrewkroh
Copy link
Member

andrewkroh commented Dec 5, 2017

Should i keep the new created function PdhGetRawCounterArray?

Why would you remove it? Isn't it used by the perfmon metricset?

So question is to implement all functions we need on our own or to use existing functions from sycall?

Reuse what's already provided if possible. So only add a way to use GetLogicalDriveStrings.

@martinscholz83
Copy link
Contributor

PdhGetRawCounterArray and PdhGetRawCounterValue are not used by perfmon. We added these functions to offer an API if you want to calculate Raw values. But maybe in the future we can add an option for perfmon to use it?!

@Elufimov
Copy link
Contributor Author

Any news on this issue?

@marekduciuc
Copy link

It would be cool to investigate if other metrics like bellow have equivalent on windows.
system.diskio.iostat.read.request.merges_per_sec
system.diskio.iostat.request.avg_size

narph added a commit to narph/beats that referenced this issue Apr 19, 2019
(SELECT * FROM Win32_PerfFormattedData_PerfDisk_LogicalDisk)

with DeviceIOControl Win32 API method using IOCTL_DISK_PERFORMANCE control code.

Fixes: elastic#3798 and elastic#2842
narph added a commit that referenced this issue Apr 26, 2019
* Replace using WMI query to get the system/diskio metrics for Windows

(SELECT * FROM Win32_PerfFormattedData_PerfDisk_LogicalDisk)

with DeviceIOControl Win32 API method using IOCTL_DISK_PERFORMANCE control code.

Fixes: #3798 and #2842

* Fixed cross platform build, added tests and include_devices filter

Added:
- include_devices filter to the IOCounters
- test file to assert get stats on C: returns data
- addressed houndci-bot style violations

* Fix goimports style

* Fix goimports styling

* Fix goimports and houndci-bot requests

* Fix support for osx

* Re-run build, address houndci-bot messages

* Update changelog file

* Addressed PR comments

* Addressing review notes

* Enrich test and implement separate function to enable the performance counters

* Add disable performance counters functionality for testing

* Log meesage when the EnableCounterForIoctl is added/updated in the registry

* Check for registry key value before updating it

* Address new code reviews

* small refactoring of deviceiocontrol functions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Metricbeat Metricbeat module Team:Integrations Label for the Integrations team :Windows
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants