Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sidekiq args filtration #2177

Merged
merged 17 commits into from
Aug 30, 2023
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .rubocop.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1259,6 +1259,7 @@ Style/MethodCallWithArgsParentheses:
- raise
- require
- skip
- sleep
- source
- stub
- stub_const
Expand Down
36 changes: 33 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,44 @@
# New Relic Ruby Agent Release Notes


## dev

Version <dev> allows the agent to record additional response information on a transaction when middleware instrumentation is disabled and fixes a bug in `NewRelic::Rack::AgentHooks.needed?`.

Version <dev> allows the agent to record additional response information on a transaction when middleware instrumentation is disabled, introduces new `:'sidekiq.args.include'` and `:'sidekiq.args.exclude:` configuration options to permit capturing only certain Sidekiq job arguments, and fixes a bug in `NewRelic::Rack::AgentHooks.needed?`.

- **Feature: Report transaction HTTP status codes when middleware instrumentation is disabled**
Previously, when `disable_middleware_instrumentation` was set to `true`, the agent would not record the value of the response code or content type on the transaction. This was due to the possibility that a middleware could alter the response, which would not be captured by the agent when the middleware instrumentation was disabled. However, based on customer feedback, the agent will now report the HTTP status code and content type on a transaction when middleware instrumentation is disabled. [PR#2175](https://github.com/newrelic/newrelic-ruby-agent/pull/2175)

- **Feature: Permit capturing only certain Sidekiq job arguments**
New `:'sidekiq.args.include'` and `:'sidekiq.args.exclude'` configuration options have been introduced to permit fine grained control over which Sidekiq job arguments (args) are reported to New Relic. By default, no Sidekiq args are reported. To report any Sidekiq options, the `:'attributes.include'` array must include the string `'jobs.sidekiq.args.*'`. With that string in place, all arguments will be reported unless one or more of the new include/exclude options are used. The `:'sidekiq.args.include'` option can be set to an array of strings. Each of those strings will be passed to `Regexp.new` and collectively serve as an allowlist for desired args. For job arguments that are hashes, if a hash's key matches one of the include patterns, then both the key and its corresponding value will be included. For scalar arguments, the string representation of the scalar will need to match one of the include patterns to be captured. The `:'sidekiq.args.exclude'` option works similarly. It can be set to an array of strings that will each be passed to `Regexp.new` to create patterns. These patterns will collectively serve as a denylist for unwanted job args. Any hash key, has value, or scalar that matches an exclude pattern will be excluded (not sent to New Relic). [PR#2177](https://github.com/newrelic/newrelic-ruby-agent/pull/2177)
fallwith marked this conversation as resolved.
Show resolved Hide resolved

`newrelic.yml` examples:

Any string in the `:'sidekiq.args.include'` or `:'sidekiq.args.exclude'` arrays will be turned into a regular expression. Knowledge of [Ruby regular expression support](https://ruby-doc.org/3.2.2/Regexp.html) can be leveraged but is not required. If regular expression syntax is not used, inexact matches will be performed and the string "Fortune" will match both "Fortune 500" and "Fortune and Glory". For exact matches, use [regular expression anchors](https://ruby-doc.org/3.2.2/Regexp.html#class-Regexp-label-Anchors).

```yaml
# Include any argument whose string representation matches either "apple" or "banana"
# The "apple" pattern will match both "green apple" and "red apple"
sidekiq.args.include:
- apple
- banana
fallwith marked this conversation as resolved.
Show resolved Hide resolved

# Exclude any arguments that match either "grape", "orange", or "pear"
sidekiq.args.exclude:
- grape
- orange
- pear

# Exclude any argument that is a 9 digit number
sidekiq.args.exclude:
- '\d{9}'

# Include anything that starts with "blue" but exclude anything that ends in "green"
sidekiq.args.include
- '^blue'

sidekiq.args.exclude
- 'green$'
```

- **Bugfix: Resolve inverted logic of NewRelic::Rack::AgentHooks.needed?**
Previously, `NewRelic::Rack::AgentHooks.needed?` incorrectly used inverted logic. This has now been resolved, allowing AgentHooks to be installed when `disable_middleware_instrumentation` is set to true. [PR#2175](https://github.com/newrelic/newrelic-ruby-agent/pull/2175)

Expand Down
97 changes: 97 additions & 0 deletions lib/new_relic/agent/attribute_processing.rb
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ module AttributeProcessing

EMPTY_HASH_STRING_LITERAL = '{}'.freeze
EMPTY_ARRAY_STRING_LITERAL = '[]'.freeze
PRE_FILTER_KEYS = %i[include exclude].freeze
DISCARDED = :nr_discarded

def flatten_and_coerce(object, prefix = nil, result = {}, &blk)
if object.is_a?(Hash)
Expand Down Expand Up @@ -57,6 +59,101 @@ def flatten_and_coerce_array(array, prefix, result, &blk)
end
end
end

def formulate_regexp_union(option)
fallwith marked this conversation as resolved.
Show resolved Hide resolved
return if NewRelic::Agent.config[option].empty?

Regexp.union(NewRelic::Agent.config[option].map { |p| string_to_regexp(p) }.uniq.compact).freeze
rescue StandardError => e
NewRelic::Agent.logger.warn("Failed to formulate a Regexp union from the '#{option}' configuration option " +
"- #{e.class}: #{e.message}")
end

def string_to_regexp(str)
Regexp.new(str)
rescue StandardError => e
NewRelic::Agent.logger.warn("Failed to initialize Regexp from string '#{str}' - #{e.class}: #{e.message}")
end

# attribute filtering suppresses data that has already been flattened
# and coerced (serialized as text) via #flatten_and_coerce, and is
# restricted to basic text matching with a single optional wildcard.
# pre filtering operates on raw Ruby objects beforehand and uses full
# Ruby regex syntax
def pre_filter(values = [], options = {})
return values unless !options.empty? && PRE_FILTER_KEYS.any? { |k| options.key?(k) }

# if there's a prefix in play for (non-pre) attribute filtration and
# attribute filtration won't allow that prefix, then don't even bother
# with pre filtration that could only result in values that would be
# blocked
if options.key?(:attribute_namespace) &&
!NewRelic::Agent.instance.attribute_filter.might_allow_prefix?(options[:attribute_namespace])
return values
end

values.each_with_object([]) do |element, filtered|
object = pre_filter_object(element, options)
filtered << object unless discarded?(object)
end
end

def pre_filter_object(object, options)
if object.is_a?(Hash)
pre_filter_hash(object, options)
elsif object.is_a?(Array)
pre_filter_array(object, options)
else
pre_filter_scalar(object, options)
end
end

def pre_filter_hash(hash, options)
filtered_hash = hash.each_with_object({}) do |(key, value), filtered|
filtered_key = pre_filter_object(key, options)
next if discarded?(filtered_key)

# If the key is permitted, skip include filtration for the value
# but still apply exclude filtration
if options.key?(:exclude)
exclude_only = options.dup
exclude_only.delete(:include)
filtered_value = pre_filter_object(value, exclude_only)
next if discarded?(filtered_value)
else
filtered_value = value
end

filtered[filtered_key] = filtered_value
end

filtered_hash.empty? && !hash.empty? ? DISCARDED : filtered_hash
end

def pre_filter_array(array, options)
filtered_array = array.each_with_object([]) do |element, filtered|
filtered_element = pre_filter_object(element, options)
next if discarded?(filtered_element)

filtered.push(filtered_element)
end

filtered_array.empty? && !array.empty? ? DISCARDED : filtered_array
end

def pre_filter_scalar(scalar, options)
return DISCARDED if options.key?(:include) && !scalar.to_s.match?(options[:include])
return DISCARDED if options.key?(:exclude) && scalar.to_s.match?(options[:exclude])

scalar
end

# `nil`, empty enumerable objects, and `false` are all valid in their
# own right as application data, so pre-filtering uses a special value
# to indicate that filtered out data has been discarded
def discarded?(object)
object == DISCARDED
end
end
end
end
26 changes: 26 additions & 0 deletions lib/new_relic/agent/configuration/default_source.rb
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
# frozen_string_literal: true

require 'forwardable'
require_relative '../../constants'

module NewRelic
module Agent
Expand Down Expand Up @@ -1715,6 +1716,31 @@ def self.enforce_fallback(allowed_values: nil, fallback: nil)
:transform => DefaultSource.method(:convert_to_regexp_list),
:description => 'Define transactions you want the agent to ignore, by specifying a list of patterns matching the URI you want to ignore. For more detail, see [the docs on ignoring specific transactions](/docs/agents/ruby-agent/api-guides/ignoring-specific-transactions/#config-ignoring).'
},
# Sidekiq
:'sidekiq.args.include' => {
default: NewRelic::EMPTY_ARRAY,
kaylareopelle marked this conversation as resolved.
Show resolved Hide resolved
public: true,
type: Array,
allowed_from_server: false,
description: 'An array of strings that will collectively serve as an allowlist for filtering which Sidekiq ' +
'job arguments get reported to New Relic. The capturing of any Sidekiq arguments requires that ' +
"'job.sidekiq.args.*' be added to the separate :'attributes.include' configuration option. Each " +
'string in this array will be turned into a regular expression via `Regexp.new` to permit advanced ' +
'matching. For job argument hashes, if either a key or value matches the pair will be included. All ' +
'matching job argument array elements and job argument scalars will be included.'
fallwith marked this conversation as resolved.
Show resolved Hide resolved
},
:'sidekiq.args.exclude' => {
default: NewRelic::EMPTY_ARRAY,
public: true,
type: Array,
allowed_from_server: false,
description: 'An array of strings that will collectively serve as a denylist for filtering which Sidekiq ' +
'job arguments get reported to New Relic. The capturing of any Sidekiq arguments requires that ' +
"'job.sidekiq.args.*' be added to the separate :'attributes.include' configuration option. Each string " +
'in this array will be turned into a regular expression via `Regexp.new` to permit advanced matching. ' +
'For job argument hashes, if either a key or value matches the pair will be excluded. All matching job ' +
'argument array elements and job argument scalars will be excluded.'
},
# Slow SQL
:'slow_sql.enabled' => {
:default => value_of(:'transaction_tracer.enabled'),
Expand Down
26 changes: 23 additions & 3 deletions lib/new_relic/agent/instrumentation/sidekiq/server.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,10 @@ class Server
include NewRelic::Agent::Instrumentation::ControllerInstrumentation
include Sidekiq::ServerMiddleware if defined?(Sidekiq::ServerMiddleware)

ATTRIBUTE_BASE_NAMESPACE = 'sidekiq.args'
ATTRIBUTE_FILTER_TYPES = %i[include exclude].freeze
ATTRIBUTE_JOB_NAMESPACE = :"job.#{ATTRIBUTE_BASE_NAMESPACE}"

# Client middleware has additional parameters, and our tests use the
# middleware client-side to work inline.
def call(worker, msg, queue, *_)
Expand All @@ -18,10 +22,16 @@ def call(worker, msg, queue, *_)
trace_headers = msg.delete(NewRelic::NEWRELIC_KEY)

perform_action_with_newrelic_trace(trace_args) do
NewRelic::Agent::Transaction.merge_untrusted_agent_attributes(msg['args'], :'job.sidekiq.args',
NewRelic::Agent::AttributeFilter::DST_NONE)
NewRelic::Agent::Transaction.merge_untrusted_agent_attributes(
NewRelic::Agent::AttributeProcessing.pre_filter(msg['args'], self.class.nr_attribute_options),
ATTRIBUTE_JOB_NAMESPACE,
NewRelic::Agent::AttributeFilter::DST_NONE
)

if ::NewRelic::Agent.config[:'distributed_tracing.enabled'] && trace_headers&.any?
::NewRelic::Agent::DistributedTracing::accept_distributed_trace_headers(trace_headers, 'Other')
end
fallwith marked this conversation as resolved.
Show resolved Hide resolved

::NewRelic::Agent::DistributedTracing::accept_distributed_trace_headers(trace_headers, 'Other') if ::NewRelic::Agent.config[:'distributed_tracing.enabled'] && trace_headers&.any?
yield
end
end
Expand All @@ -33,5 +43,15 @@ def self.default_trace_args(msg)
:category => 'OtherTransaction/SidekiqJob'
}
end

def self.nr_attribute_options
@nr_attribute_options ||= begin
ATTRIBUTE_FILTER_TYPES.each_with_object({}) do |type, opts|
pattern =
NewRelic::Agent::AttributeProcessing.formulate_regexp_union(:"#{ATTRIBUTE_BASE_NAMESPACE}.#{type}")
opts[type] = pattern if pattern
end.merge(attribute_namespace: ATTRIBUTE_JOB_NAMESPACE)
end
end
end
end
5 changes: 5 additions & 0 deletions sidekiq_todo
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
- Supportability metric on every job call?
- Perform sync not reported?
- Should we report job success/failure status and queue name?


8 changes: 8 additions & 0 deletions test.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# frozen_string_literal: true

string = 'hello string'
array = %w[hello array]
symbol = :hello_symbol
regexp = /hello regex/

[string, array, symbol, regexp].each { |o| puts "#{o} (#{o.class}) frozen? => #{o.frozen?}" }
8 changes: 8 additions & 0 deletions test/helpers/config_scanning.rb
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,11 @@ module ConfigScanning
EVENT_BUFFER_MACRO_PATTERN = /(capacity_key|enabled_key)\s+:(['"])?([a-z\._]+)\2?/
ASSIGNED_CONSTANT_PATTERN = /[A-Z]+\s*=\s*:(['"])?([a-z\._]+)\1?\s*/

# These config settings shouldn't be worried about, possibly because they
# are only referenced via Ruby metaprogramming that won't work with this
# module's regex matching
IGNORED = %i[sidekiq.args.include sidekiq.args.exclude]

def scan_and_remove_used_entries(default_keys, non_test_files)
non_test_files.each do |file|
lines_in(file).each do |line|
Expand All @@ -30,6 +35,9 @@ def scan_and_remove_used_entries(default_keys, non_test_files)
captures.flatten.compact.each do |key|
default_keys.delete(key.delete("'").to_sym)
end

IGNORED.each { |key| default_keys.delete(key) }
fallwith marked this conversation as resolved.
Show resolved Hide resolved

# Remove any config keys that are annotated with the 'dynamic_name' setting
# This indicates that the names of these keys are constructed dynamically at
# runtime, so we don't expect any explicit references to them in code.
Expand Down
16 changes: 0 additions & 16 deletions test/multiverse/suites/sidekiq/after_suite.rb

This file was deleted.

Loading
Loading