Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Services collection using API (no WMI) #812

Merged
merged 3 commits into from
Aug 25, 2021

Conversation

carlossscastro
Copy link
Contributor

Team,

As we have mentioned before in #771, we are currently facing some issues while trying to run the Windows Exporter on machines under certain circumstances, with those being specially memory and CPU pressure. After doing some research, we have traced these issues back to the use of the WMI query interface. WMI is known to be not very well performant and to have issues when the machine is under this kind of resource pressure, as reported for example here, here and here.

In this PR, we implement an alternative way of collecting windows service metrics, which is the use case we currently face the most challenging, through the windows APIs rather than WMI. We have tested this solution and it does solve most of the problems we faced when running the exporter in "hot" machines. Use of the new mechanism is opt-in, by specifying the collector.service.useAPI flag.

Unfortunately, when this new mechanism is used, the collector.service.services-where flag, based on WQL, which was previously appended to the WMI query, can no longer be honored, and if both are present we just log a warning to inform the user about this. Additionally, the status field of the service is not reported either, as this was also produced by WMI. Users can instead look at the state property which should have the same information, although formatted in a different way.

We'd greatly appreciate some feedback on the approach! Any suggestions on both the code and the UX of the new behavior would be more than welcome.

@nadiamoe
Copy link

nadiamoe commented Aug 4, 2021

Hi there! It's been a while since we opened this PR, is there anything we can do to get some feedback on it? We're open to make changes in the design and/or implementation if the current patch is not aligned with what you're looking for.

If you could take a look @carlpett @breed808 it would be fantastic!
And apologies for pinging you directly, I saw you were active reviewers in this repo and I am not sure if the team mentioned in the CODEOWNERS file is up to date 😕

Many thanks in advance!

@breed808
Copy link
Contributor

breed808 commented Aug 6, 2021

Hi all, apologies for the delayed response with this. I've had some personal matters taking away from my free time in the past couple months.

I'm quite happy with the code, particularly that it's behind a disabled-by-default flag, so this can be introduced with minimal impact for existing users 👍

I've run some synthetic benchmarks between the two service collector sources, and I'm getting a ~15% increase in collector time compared to the WMI source. Is this expected?
It's not a big deal as the the anticipated gains in stability more than outweigh a slight performance decrease.

@nadiamoe
Copy link

Thanks a lot for the feedback @breed808!

This surprises me as well, I'm not 100% sure about the exact benchmarks we have made in our end, but I'd also expect the API collection to be faster than WMI.

I'll check with the team and see if we can reproduce and track down that slowdown to something concrete.

@breed808
Copy link
Contributor

Thanks @roobre! I wouldn't consider this a blocker for merging, but it would be nice to see if you can reproduce it before we continue 👍

@carlossscastro
Copy link
Contributor Author

@breed808

We were able to spot the issue. There was a logic problem on the code where the exporter was actually collecting metrics twice.
We just submitted the fix so if you could now run your tests against it would be great.

Let us know if any other changes are needed.

Thanks again for your feedback!

Copy link
Contributor

@breed808 breed808 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! I can see a ~80% decrease in query time with the API method, with the recent changes.
You'll need to sign the DCO in order for the CI to pass, then we're good to merge.

Signed-off-by: Carlos Castro <ccastro@newrelic.com>
Signed-off-by: Carlos Castro <ccastro@newrelic.com>
Signed-off-by: Carlos Castro <ccastro@newrelic.com>
@carlossscastro
Copy link
Contributor Author

@breed808

Thanks for testing. Great to see some nice performance improvements.
I pushed again with a signed commit.

Let us know if anything else is needed!

@breed808
Copy link
Contributor

Excellent, thanks for that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants