Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Redis instrumentation db_statement_serializer #1571

Closed

Conversation

tombruijn
Copy link
Contributor

@tombruijn tombruijn commented Jan 10, 2023

Description

Add user config function db_statement_serializer to allow users to sanitize queries set on the spans on the db.statement attribute.

This gives users a way to sanitize queries and not store any PII on the span.

This user function approach was chosen, because I've seen it in the Node.js OpenTelemetry instrumentation packages as well.

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • tox -e test-instrumentation-redis

Does This PR Require a Core Repo Change?

  • No.

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

Add user config function `db_statement_serializer` to allow users to
sanitize queries set on the spans on the `db.statement` attribute.

This gives users a way to sanitize queries and not store any PII on the
span.

This user function approach was chosen, because I've seen it in the
Node.js OpenTelemetry instrumentation packages as well.
@tombruijn tombruijn requested a review from a team January 10, 2023 09:59
@avzis
Copy link
Contributor

avzis commented Jan 10, 2023

This will solve #1548

I saw that it is implemented in JS in this way, and the code is correct, but I wonder:

In this implementation the default behavior is that the data is presented (like it is currently),
and I think that the default behavior should be to sanitize the data.

Another thing is, i think that requiring the user to write a sanitization function himself is quite complicated,
why not just implement it as a boolean - either sanitize or not

And if in the future there will be a need for specific sanitization function, we can add this functionality.

@tombruijn
Copy link
Contributor Author

tombruijn commented Jan 10, 2023

@avzis I agree that expecting the user write something themselves is not great. It can easily break things.

I can add the implementation that I documented as the one that's the default, and add it as a sanitize_query: true option (suggestions for better names are welcome).

I can send in a new PR for that :)

@avzis
Copy link
Contributor

avzis commented Jan 10, 2023

@tombruijn i think that it will be great :)

tombruijn added a commit to tombruijn/opentelemetry-python-contrib that referenced this pull request Jan 10, 2023
Add a Redis query sanitizer. This can be disabled with the
`sanitize_query = False` config option.

Given the query `SET key value`, the sanitized query becomes `SET ? ?`.
Both the keys and values are sanitized, as both can contain PII data.

The Redis queries are sanitized by default. This changes the default
behavior of this instrumentation. Previously it reported unsanitized
Redis queries.

This was previously discussed in the previous implementation of this PR
in PR open-telemetry#1571

Closes open-telemetry#1548
tombruijn added a commit to tombruijn/opentelemetry-python-contrib that referenced this pull request Jan 10, 2023
Add a query sanitizer to the Redis instrumentation. This can be disabled
with the `sanitize_query = False` config option.

Given the query `SET key value`, the sanitized query becomes `SET ? ?`.
Both the keys and values are sanitized, as both can contain PII data.

The Redis queries are sanitized by default. This changes the default
behavior of this instrumentation. Previously it reported unsanitized
Redis queries.

This was previously discussed in the previous implementation of this PR
in PR open-telemetry#1571

Closes open-telemetry#1548
@tombruijn
Copy link
Contributor Author

@avzis I created a new PR in #1572. Closing this one.

@tombruijn tombruijn closed this Jan 10, 2023
srikanthccv added a commit that referenced this pull request Feb 4, 2023
* Add Redis instrumentation query sanitization

Add a query sanitizer to the Redis instrumentation. This can be disabled
with the `sanitize_query = False` config option.

Given the query `SET key value`, the sanitized query becomes `SET ? ?`.
Both the keys and values are sanitized, as both can contain PII data.

The Redis queries are sanitized by default. This changes the default
behavior of this instrumentation. Previously it reported unsanitized
Redis queries.

This was previously discussed in the previous implementation of this PR
in PR #1571

Closes #1548

* Update Redis sanitize_query option documentation

Changes suggested in
#1572 (comment)

* Remove uninstrument & instrument from test setup

The Redis test that performs the tests with the default options, doesn't
need to uninstrument and then instrument the instrumentor. This commit
removes the unnecessary setup code. The setup code is already present at
the top of the file.

* Fix code style formatting

* Update Redis functional tests

- Update the sanitizer to also account for a max `db.statement`
  attribute value length. No longer than 1000 characters.
- Update the functional tests to assume the queries are sanitized by
  default.
- Add new tests that test the behavior with sanitization turned off.
  Only for the tests in the first test class. I don't think it's needed
  to duplicate this test for the clustered and async setup combinations.

* Test Redis unsanitized queries by default

Change the Redis functional tests so that they test the unsanitized
query by default, and test the sanitized query results in the separate
test functions.

This is a partial revert of the previous commit
8d56c2f

* Fix formatting issue in Redis utils

* Disable Redis query sanitization by default

Update the Redis instrumentation library to not change the default
behavior for the Redis instrumentation. This can be enabled at a later
time when the spec discussion about this topic has concluded.

open-telemetry/opentelemetry-specification#3104

* Fix pylint issue

Remove else statement.

* Update changelog about Redis query sanitization default

[ci skip]

Co-authored-by: Srikanth Chekuri <srikanth.chekuri92@gmail.com>

* Fix potential error on Redis args being 0

Check the length of the args array and return an empty string if there
are no args.

That way it won't cause an IndexError if the args array is empty and it
tries to fetch the first element, which should be the Redis command.

---------

Co-authored-by: Srikanth Chekuri <srikanth.chekuri92@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants