Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sensitive Data Handling #100
base: main
Are you sure you want to change the base?
Sensitive Data Handling #100
Changes from 3 commits
bc78353
b95e94c
509e83b
7349386
f142c93
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the Java auto-instrumentation mentioned as an example or does this only apply to the Java auto-instrumentation? Please clarify.
If this is about all kinds of instrumentations, do you intend to redact all set attributes by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sticking with the sql example, it would also apply to the node.js instrumentation in opentelemetry-js, including the opentelemetry-plugin-mysql and opentelemetry-plugin-postgres packages. It doesn't look like the nascent .NET auto agent would be affected (yet).
No, this wouldn't apply a single rule to all set attributes. We assume manual instrumentation knows what it is doing. Since we are writing the instrumentation for auto-instrumenters, we are aware of the semantics of the attribute values and can apply appropriate rules (e.g., we know that the numbers in
http.status_code
are not risks, but that a database connection string might contain a password.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work for all sorts of SQL dialects out there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a sql lexer can easily broadly apply to many vendors' dialects, because it's only interested in tokenization and not in syntax/grammar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how this can possibly be done efficiently inside the language SDKs, as opposed to scrubbing this in the ingestion pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only seeing this otep now as it was linked to from another newer otep -- I'd like to see something like this added so going over this one now.
I don't think this is a good example since the db statement should already be parameterized, like an extended query in postgres. Are there many databases that don't support a form of query + args like found in SQL databases?
So for db instrumentation we should document to the user that they should not manually construct queries with dynamic data in general but most importantly for the case of instrumentation, to not do it with sensitive data.
And instead the query parameter attributes need to be either not included at all or have the ability to be filtered by key.
This would require no SQL parsing library and be doable in all of the implementations.
Of course best practices aren't always an option so having the SQL parsing that can scrub all predicates in the collector would be a must have for some.
I also think this relates to span processors and how they need updating to provide functionality like this. I planned for that to be a follow up that builds on my OTEP #186 but since it has not been met with a positive reception so far I may have to rethink that :)
This OTEP is about a "broad agreement" and I agree with that but put the rest here because I think the broad agreement should include span transformations in the export pipeline that doesn't yet exist, and that one of those transformations should be a required configurable attribute scrubber.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this something to be configured in the UI of a backend or at the place where the instrumentation is set up? Are these bold red warnings supposed to show up in a log produced by the instrumentation or in the tracing backend's UI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Config would remain local to each library (a manual wrapper might provide a dictionary of options as an API parameter, an auto agent might use an environment variable or config file). The warning would appear in the documentation of this config knob.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm skeptical about the performance impact on the host application, somewhat, but I'm even less convinced that we're going to build out scrubbers for PII in every client language.
This seems to suggest a configuration where an OTel collector is co-located in the side-car configuration, so that it resides in the same security domain. Then the statement is "yes, by default we scrub the credit card number immediately after it crosses outside your process a locally running sidecar."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Making the promise that scrubbing will be done in-process in every language is a very large scope creep. Doing it once in a sidecar or central tier is much more sustainable and manageable.