-
Notifications
You must be signed in to change notification settings - Fork 599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create :'elasticsearch.perform_health_check' config #2369
Conversation
@@ -313,6 +313,7 @@ def self.enforce_fallback(allowed_values: nil, fallback: nil) | |||
'webpacker:compile' | |||
].join(',').freeze | |||
|
|||
# rubocop:disable Metrics/CollectionLiteralLength |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NewRelic::Agent::Configuration::DEFAULTS
hash now exceeds 250 lines, which violates the Metrics/CollectionLiteralLength
rule. Moving DEFAULTS
into a separate data storage file, like YAML or JSON would impact many other parts of our system, such as the generation of our documentation using HEREDOCs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
^-- This is a summary of a discussion with @fallwith and @hannahramadan about temporarily disabling this rule. Should it be included in a source code comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems fine to me, but i'm good with it either way!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminder for @hannahramadan to update #2367 to match
@nr_cluster_name ||= if NewRelic::Agent.config[:'elasticsearch.perform_health_check'] | ||
perform_request('GET', '_cluster/health').body['cluster_name'] | ||
else | ||
NewRelic::UNKNOWN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically the database_name
attribute on a datastore segment can be nil. Would that make more sense here than Unknown
? I think it's closer to the API's expectations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I think one of the benefits to not using nil is that we won't have to check the config every single time. If we did set it to nil, then line 55 wouldn't return early anymore, so we'd have to do everything else every time and itd take a bit longer. Not sure if it would be a concerning increase though, or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right! Thanks for setting me straight :)
We could add another guard clause between lines 54 and 55 to check for the config and return early if it's false, but that might still be less performant than just setting "Unknown" since we'd be adding a hash lookup every time instead of a condition just once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 'Unknown'
vs nil
is worth discussing for what gets sent up in the db span. Originally I felt that 'Unknown'
would convey a failure to obtain the name or a deliberate short-circuit preventing the discovery, whereas nil
would convey that the name is unset (given that Rails apps don't require the name in database.yml
).
If nil
is thought to be more appropriate, we can work around the performance implications. Perhaps defined?
is faster than a hash lookup.
- @nr_cluster_name ||= if NewRelic::Agent.config[:'elasticsearch.perform_health_check']
+ return @nr_cluster_name if defined?(@nr_cluster_name)
+
+ @nr_cluster_name = if NewRelic::Agent.config[:'elasticsearch.perform_health_check']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
start_datastore_segment
's default arguments set database_name
to nil
, which leads me to believe that's a more standard value than Unknown
.
The code change I alluded to slightly differs from what you've suggested above.
Using the method as it exists on dev
, the change would be:
def nr_cluster_name
+ return unless NewRelic::Agent.config[:'elasticsearch.perform_health_check']
return @nr_cluster_name if @nr_cluster_name
return if nr_hosts.empty?
NewRelic::Agent.disable_all_tracing do
@nr_cluster_name ||= perform_request('GET', '_cluster/health').body['cluster_name']
end
rescue StandardError => e
NewRelic::Agent.logger.error('Failed to get cluster name for elasticsearch', e)
nil
end
So the NewRelic::Agent.config
check would occur every time #nr_cluster_name
is called, instead of when the @nr_cluster_name
variable is defined/assigned. I don't know if there's a way to make that config check more performant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I was definitely thinking from the perspective of a change to config option only taking effect on a complete restart of the agent, so we'd only check the config hash one time and then either cache the string cluster name value or nil
. And then use defined?
for memoization help with nil
. If you actually want to check the config freshly each time, then a hash lookup in a relatively small hash with symbols as keys shouldn't be too bad and the code as-is works great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ohh, I understand now. Thanks for clarifying! Your latest comment made me realize we don't need to check the config that often. It's reasonable to expect the config won't change during Elasticsearch client initialization. I was getting overly fixated on the test scenario where I needed to create a new client. Thank you!
:default => true, | ||
:public => true, | ||
:type => Boolean, | ||
:allowed_from_server => true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is nothing on the server side that would update this config, so this should probably be false.
@nr_cluster_name ||= if NewRelic::Agent.config[:'elasticsearch.perform_health_check'] | ||
perform_request('GET', '_cluster/health').body['cluster_name'] | ||
else | ||
NewRelic::UNKNOWN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I think one of the benefits to not using nil is that we won't have to check the config every single time. If we did set it to nil, then line 55 wouldn't return early anymore, so we'd have to do everything else every time and itd take a bit longer. Not sure if it would be a concerning increase though, or not.
@@ -313,6 +313,7 @@ def self.enforce_fallback(allowed_values: nil, fallback: nil) | |||
'webpacker:compile' | |||
].join(',').freeze | |||
|
|||
# rubocop:disable Metrics/CollectionLiteralLength |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems fine to me, but i'm good with it either way!
@nr_cluster_name ||= if NewRelic::Agent.config[:'elasticsearch.perform_health_check'] | ||
perform_request('GET', '_cluster/health').body['cluster_name'] | ||
else | ||
NewRelic::UNKNOWN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right! Thanks for setting me straight :)
We could add another guard clause between lines 54 and 55 to check for the config and return early if it's false, but that might still be less performant than just setting "Unknown" since we'd be adding a hash lookup every time instead of a condition just once.
239ea65
to
a330da8
Compare
To access the cluster name to assign to the database_name attribute on the Elasticsearch datastore segment, we currently make a request to the '_cluster/health' endpoint. This can cause CPU and network spikes when large clusters start. To avoid this performance issue, the `:'elasticsearch.perform_health_check` config value can be set to false. This will set the database_name attribute on the Elasticsearch datastore segments to nil.
a330da8
to
b03ea44
Compare
SimpleCov Report
|
@kaylareopelle Out of curiosity, why are we using the edit: A colleague also just pointed out to me that simply hitting |
Hi @gremerritt, those are great suggestions! We considered trying out If we set that up in a branch, would you be willing to test it in a staging environment to see how that endpoint impacts your cluster? We'd probably prefer to use |
@kaylareopelle thanks for the update. Yes @erikkessler1 and I have been working together on this issue. Thanks for posting that branch, I think we'll have some bandwidth to give it a test. From some more discussion on our team, it sounds like the "canonical" way to fetch the cluster name is by hitting the base @nr_cluster_name ||= perform_request('GET', '/').body['cluster_name'] We're happy to test out either. Thanks for digging into this! |
Hi @gremerritt! Thanks for your response! I appreciate the work you and your team are doing to help us find the right endpoint! Out of curiosity, what helped your team determine the base endpoint, I think it's a great next test! I've created #2377 on branch |
Closing in favor of #2377 |
To access the cluster name to assign to the
database_name
attribute on the Elasticsearch datastore segment, we currently make a request to the'_cluster/health'
endpoint. This can cause CPU and network spikes when large clusters start.To avoid this performance issue, we're introducing the
:'elasticsearch.perform_health_check'
config value. This value can be set to false to stop the agent from making requests to requests to'_cluster/health'
.When this config is false, the
database_name
attribute on all Elasticsearch datastore segments will be 'Unknown'.Relates to: #2360