-
Notifications
You must be signed in to change notification settings - Fork 629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add botocore db.statement sanitization for dynamoDB #1662
base: main
Are you sure you want to change the base?
Add botocore db.statement sanitization for dynamoDB #1662
Conversation
...pentelemetry-instrumentation-botocore/src/opentelemetry/instrumentation/botocore/__init__.py
Outdated
Show resolved
Hide resolved
...pentelemetry-instrumentation-botocore/src/opentelemetry/instrumentation/botocore/__init__.py
Outdated
Show resolved
Hide resolved
...pentelemetry-instrumentation-botocore/src/opentelemetry/instrumentation/botocore/__init__.py
Outdated
Show resolved
Hide resolved
@@ -354,6 +374,12 @@ def __init__(self, call_context: _AwsSdkCallContext): | |||
def extract_attributes(self, attributes: _AttributeMapT): | |||
attributes[SpanAttributes.DB_SYSTEM] = DbSystemValues.DYNAMODB.value | |||
attributes[SpanAttributes.DB_OPERATION] = self._call_context.operation | |||
attributes[SpanAttributes.DB_STATEMENT] = str( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder if capturing the db.statement
for DynamoDB is really applicable in terms of semantic conventions, at least in the way how it is captured here.
right now this would capture (more or less) a serialized JSON string consisting of all the parameters that were passed to the DynamoDb SDK function (e.g. client.get_item()). This doesn't seem to look like defined format and backends will probably have a hard time interpreting it.
The instrumentation is also already capturing some of the parameters (e.g. TableName, ConsistentRead, ConsumedCapacity, ...) in other span attributes. So there will probably be some redundant/unrelated parts in the db.statement
.
Additionally, boto3 provides at least two ways to call certain APIs. E.g. for the get_item
(and most other) functions there exists a client API and a convenience table API. They are mostly the sames but slightly differ in e.g. how the Key
parameter needs to be passed. For a captured unsanitized db.statement
attribute this would probably mean that the statement would differ depending on which API is used.
Furthermore, there exists also the PartiQL query language that is supported by certain DynamoDB APIs e.g. the execute_statement. Currently the instrumentation doesn't capture any additional span attributes for any of these PartiQL specific APIs, but it might lead to inconsistencies in case support for it is added in the future, where then one API captures the db.statement
as serialized JSON and another API captures PartiQL query string.
...y-instrumentation-botocore/src/opentelemetry/instrumentation/botocore/extensions/dynamodb.py
Outdated
Show resolved
Hide resolved
…add/botocore-sanitization # Conflicts: # instrumentation/opentelemetry-instrumentation-botocore/src/opentelemetry/instrumentation/botocore/__init__.py # instrumentation/opentelemetry-instrumentation-botocore/src/opentelemetry/instrumentation/botocore/extensions/dynamodb.py # instrumentation/opentelemetry-instrumentation-botocore/tests/test_botocore_dynamodb.py
...etry-instrumentation-botocore/src/opentelemetry/instrumentation/botocore/extensions/types.py
Show resolved
Hide resolved
...pentelemetry-instrumentation-botocore/src/opentelemetry/instrumentation/botocore/__init__.py
Outdated
Show resolved
Hide resolved
@@ -161,7 +178,9 @@ def _patched_api_call(self, original_func, instance, args, kwargs): | |||
if context_api.get_value(_SUPPRESS_INSTRUMENTATION_KEY): | |||
return original_func(*args, **kwargs) | |||
|
|||
call_context = _determine_call_context(instance, args) | |||
call_context = _determine_call_context( | |||
instance, args + (self.configuration,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
args
is what the botocore library passes to botocore.client.BaseClient._make_api_call
(the instrumented function). i think currently it is the name of the operation (e.g. Lambda.ListLayers
) and the input parameters (e.g. client.list_layers(<input>)
) as dictionary.
This might potentially change in the future, so i think it would be better to pass the configuration
as separate parameter to _determine_call_context
(and in turn to _AwsSdkCallContext
) instead of adding it to the args
. This would also avoid issues when e.g. the type of args
changes from tuple
to list
.
if self._configuration.get("sanitize_query"): | ||
attributes[ | ||
SpanAttributes.DB_STATEMENT | ||
] = _conv_params_to_sanitized_str(self._call_context.params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
afaik there isn't a specification yet how the db.statement
should look like for DynamoDB.
i think this would be defined first to so that all techs (Js, Ruby, .NET, ....) use a commonly understood format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it then even make sens to add statement sanitization if the format isn't specced yet? Also when considering that there doesn't seem to be a common query language for DynamoDB (e.g SDK input parameters vs PartiQL).
Personally i don't have a strong opinion about statement capturing/sanitization and i'm just pointing out that it might lead to breaking changes in the future.
If you think this is ok feel free and continue with merging this PR :)
@mariojonke approve if your comments are addressed or request changes to prevent the accidental merge. |
...y-instrumentation-botocore/src/opentelemetry/instrumentation/botocore/extensions/dynamodb.py
Outdated
Show resolved
Hide resolved
@nemoshlag are there any updates on the comments from @mariojonke? |
Description
Added an optional query sanitizer to the botocore dynamoDB instrumentation.
Usage
BotocoreInstrumentor().instrument(sanitize_query=True)
This will affect the DB_STATEMENT value to contain the original query or sanitized one.
Fixes open-telemetry/opentelemetry-specification#1544
Following the specification discussion here
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
Does This PR Require a Core Repo Change?
Checklist: