Failed to get cluster name for elasticsearch during health check routine #2730

ascoppa · 2024-06-25T12:56:02Z

Description

After upgrading the New Relic agent from version 9.0.0 to 9.10.2, we began encountering multiple signature errors on our monitoring systems. Upon further inspection, the following errors were identified in our logs:

[NewRelic][2024-06-17 20:12:22 +0000 web.1 (60)] ERROR : Failed to get cluster name for elasticsearch

and immediately below this ☝️ line

[NewRelic] ERROR : Elasticsearch::Transport::Transport::Errors::Forbidden: [403] {"message":"The request signature 
we calculated does not match the signature you provided.  Check your AWS Secret Access Key and signing method. 
Consult the service documentation for details. The Canonical String for this request should have been 'GET / 
content-type:application/json host:[HIDDEN-FOR-SECURITY-REASONS]  user-agent:Faraday v1.10.2 
x-amz-content-sha256:[HIDDEN-FOR-SECURITY-REASONS] x-amz-date:20240617T201222Z 
x-elastic-client-meta:es=6.8.3,rb=3.1.4,t=6.8.3,fd=1.10.2,ty=1.4.0 content-type;host;user-agent;
x-amz-content-sha256;x-amz-date;x-elastic-client-meta [HIDDEN-FOR-SECURITY-REASONS]' 
The String-to-Sign should have been '[HIDDEN-FOR-SECURITY-REASONS]' "}

These requests are being done by new relic ruby agent apparently as part of a health check routine (they trigger multiple times over time) so as a workaround we decided to disable new relic instrumentation on Elasticsearch by setting the following environment variable NEW_RELIC_INSTRUMENTATION_ELASTICSEARCH to disabled. After doing so, the signature errors disappeared.

Expected Behavior

No signature errors should appear during the health check routine.

Your Environment

Ruby -> 3.1.4
Rails -> 6.1.7.7
Elasticsearch -> 6.2
newrelic_rpm -> 9.10.2

Note

If you need any further clarification, please don't hesitate to ask. Thank you.

The text was updated successfully, but these errors were encountered:

workato-integration · 2024-06-25T12:56:06Z

https://new-relic.atlassian.net/browse/NR-284469

hannahramadan · 2024-06-27T18:25:59Z

Hi @ascoppa! Thanks for letting us know about the issue. Between agent verisons 9.0.0 and 9.10.2, we started using a different endpoint to get the cluster name. This helped with performance but might be causing the error you're seeing.

We don't officially support Elasticsearch versions below 7, but we want to to explore this issue a little further to see if this is an Elasticsearch version issue or something else. You also make a good point about the number of errors we're generating, so we're going to look into reducing those.

ascoppa · 2024-07-08T18:43:39Z

Hi @hannahramadan, thank you for looking at this. I was OOO shortly after opening this issue so I wasn't able to properly follow the activity on this issue. Were you guys able to find anything relevant? Is there anything else I can do to help?

tannalynn · 2024-07-09T21:50:25Z

Hello @ascoppa

I tried to recreate this error using a supported elasticsearch version (7.10), and using AWS for elasticsearch since that appears to be the source of your error. Unfortunately I wasn't able to reproduce the error, and I was able to see the cluster name fine in my test.

It seems like this may be related to using elasticsearch 6, or perhaps some other unknown variable, such as specific AWS security settings.

Considering it's not ideal that the agent keeps trying to get the cluster name and logs the error every time the client makes a call, would it be usable solution for you if instead we updated it so that each elasticsearch client instance tries to get the cluster name one time only, and if it fails (like yours does), it simply won't try again? Of course, if you're creating many client instances, it might not be quite as helpful as I'm hoping, but if a change like that sounds good to you, just let me know!

tannalynn · 2024-07-11T20:02:42Z

I updated the elasticsearch instrumentation to only try to get the cluster name once per client instance here #2743.
Since it seems this probably related to running an unsupported elasticsearch version, or something else outside of the agent (like AWS security settings), and we haven't reproduced it with a supported elasticsearch version running on AWS, we'll be closing the issue once that PR is merged.
Hopefully decreasing the number of attempts to get the cluster name helps!

ascoppa added the bug label Jun 25, 2024

github-actions bot added the community To tag external issues and PRs submitted by the community label Jun 25, 2024

workato-integration bot changed the title ~~Failed to get cluster name for elasticsearch during health check routine~~ Failed to get cluster name for elasticsearch during health check routine Jun 25, 2024

hannahramadan mentioned this issue Jun 25, 2024

Lockdown Elasticsearch instrumentation to 7+ #2731

Closed

kford-newrelic added this to the Ruby Agent - FY25Q2 KTLO & Core Instrumentation Improvements milestone Jul 9, 2024

kford-newrelic added the jul-sep qtr Represents proposed work item for the Jul-Sep quarter label Jul 9, 2024

tannalynn mentioned this issue Jul 11, 2024

Elasticsearch: Only try once per client instance to get cluster name #2743

Merged

tannalynn closed this as completed in #2743 Jul 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to get cluster name for elasticsearch during health check routine #2730

Failed to get cluster name for elasticsearch during health check routine #2730

ascoppa commented Jun 25, 2024 •

edited

Loading

workato-integration bot commented Jun 25, 2024

hannahramadan commented Jun 27, 2024

ascoppa commented Jul 8, 2024

tannalynn commented Jul 9, 2024

tannalynn commented Jul 11, 2024

Failed to get cluster name for elasticsearch during health check routine #2730

Failed to get cluster name for elasticsearch during health check routine #2730

Comments

ascoppa commented Jun 25, 2024 • edited Loading

Description

Expected Behavior

Your Environment

Note

workato-integration bot commented Jun 25, 2024

hannahramadan commented Jun 27, 2024

ascoppa commented Jul 8, 2024

tannalynn commented Jul 9, 2024

tannalynn commented Jul 11, 2024

ascoppa commented Jun 25, 2024 •

edited

Loading