Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to get cluster name for elasticsearch during health check routine #2730

Closed
ascoppa opened this issue Jun 25, 2024 · 5 comments · Fixed by #2743
Closed

Failed to get cluster name for elasticsearch during health check routine #2730

ascoppa opened this issue Jun 25, 2024 · 5 comments · Fixed by #2743
Labels
bug community To tag external issues and PRs submitted by the community jul-sep qtr Represents proposed work item for the Jul-Sep quarter

Comments

@ascoppa
Copy link

ascoppa commented Jun 25, 2024

Description

After upgrading the New Relic agent from version 9.0.0 to 9.10.2, we began encountering multiple signature errors on our monitoring systems. Upon further inspection, the following errors were identified in our logs:

[NewRelic][2024-06-17 20:12:22 +0000 web.1 (60)] ERROR : Failed to get cluster name for elasticsearch

and immediately below this ☝️ line

[NewRelic] ERROR : Elasticsearch::Transport::Transport::Errors::Forbidden: [403] {"message":"The request signature 
we calculated does not match the signature you provided.  Check your AWS Secret Access Key and signing method. 
Consult the service documentation for details. The Canonical String for this request should have been 'GET / 
content-type:application/json host:[HIDDEN-FOR-SECURITY-REASONS]  user-agent:Faraday v1.10.2 
x-amz-content-sha256:[HIDDEN-FOR-SECURITY-REASONS] x-amz-date:20240617T201222Z 
x-elastic-client-meta:es=6.8.3,rb=3.1.4,t=6.8.3,fd=1.10.2,ty=1.4.0 content-type;host;user-agent;
x-amz-content-sha256;x-amz-date;x-elastic-client-meta [HIDDEN-FOR-SECURITY-REASONS]' 
The String-to-Sign should have been '[HIDDEN-FOR-SECURITY-REASONS]' "}

These requests are being done by new relic ruby agent apparently as part of a health check routine (they trigger multiple times over time) so as a workaround we decided to disable new relic instrumentation on Elasticsearch by setting the following environment variable NEW_RELIC_INSTRUMENTATION_ELASTICSEARCH to disabled. After doing so, the signature errors disappeared.

Expected Behavior

No signature errors should appear during the health check routine.

Your Environment

  • Ruby -> 3.1.4
  • Rails -> 6.1.7.7
  • Elasticsearch -> 6.2
  • newrelic_rpm -> 9.10.2

Note

If you need any further clarification, please don't hesitate to ask. Thank you.

@ascoppa ascoppa added the bug label Jun 25, 2024
@workato-integration
Copy link

@github-actions github-actions bot added the community To tag external issues and PRs submitted by the community label Jun 25, 2024
@workato-integration workato-integration bot changed the title Failed to get cluster name for elasticsearch during health check routine Failed to get cluster name for elasticsearch during health check routine Jun 25, 2024
@hannahramadan
Copy link
Contributor

Hi @ascoppa! Thanks for letting us know about the issue. Between agent verisons 9.0.0 and 9.10.2, we started using a different endpoint to get the cluster name. This helped with performance but might be causing the error you're seeing.

We don't officially support Elasticsearch versions below 7, but we want to to explore this issue a little further to see if this is an Elasticsearch version issue or something else. You also make a good point about the number of errors we're generating, so we're going to look into reducing those.

@ascoppa
Copy link
Author

ascoppa commented Jul 8, 2024

Hi @hannahramadan, thank you for looking at this. I was OOO shortly after opening this issue so I wasn't able to properly follow the activity on this issue. Were you guys able to find anything relevant? Is there anything else I can do to help?

@kford-newrelic kford-newrelic added the jul-sep qtr Represents proposed work item for the Jul-Sep quarter label Jul 9, 2024
@tannalynn
Copy link
Contributor

Hello @ascoppa

I tried to recreate this error using a supported elasticsearch version (7.10), and using AWS for elasticsearch since that appears to be the source of your error. Unfortunately I wasn't able to reproduce the error, and I was able to see the cluster name fine in my test.

It seems like this may be related to using elasticsearch 6, or perhaps some other unknown variable, such as specific AWS security settings.

Considering it's not ideal that the agent keeps trying to get the cluster name and logs the error every time the client makes a call, would it be usable solution for you if instead we updated it so that each elasticsearch client instance tries to get the cluster name one time only, and if it fails (like yours does), it simply won't try again? Of course, if you're creating many client instances, it might not be quite as helpful as I'm hoping, but if a change like that sounds good to you, just let me know!

@tannalynn
Copy link
Contributor

I updated the elasticsearch instrumentation to only try to get the cluster name once per client instance here #2743.
Since it seems this probably related to running an unsupported elasticsearch version, or something else outside of the agent (like AWS security settings), and we haven't reproduced it with a supported elasticsearch version running on AWS, we'll be closing the issue once that PR is merged.
Hopefully decreasing the number of attempts to get the cluster name helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug community To tag external issues and PRs submitted by the community jul-sep qtr Represents proposed work item for the Jul-Sep quarter
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants