Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider pulling in more url.* fields from ECS as part of alignment #3405

Open
trask opened this issue Apr 17, 2023 · 5 comments
Open

Consider pulling in more url.* fields from ECS as part of alignment #3405

trask opened this issue Apr 17, 2023 · 5 comments
Assignees
Labels
spec:trace Related to the specification/trace directory

Comments

@trask
Copy link
Member

trask commented Apr 17, 2023

Consider pulling in some or all of:

  • url.domain
  • url.extension
  • url.fragment
  • url.original
  • url.password
  • url.port
  • url.registered_domain
  • url.subdomain
  • url.top_level_domain
  • url.username

Based on @AlexanderWert's #3355 (comment)

So, I would suggest if there are no strong reasons for leaving out certain fields from ECS (especially when they are not conflicting with others in OTel), let's try to adopt them. Otherwise, it would imply unnecessary breaking changes on ECS-based data.

And @tigrannajaryan's #3355 (comment):

I believe logs should share many (most) of the semantic conventions with traces. There may be exceptions to this, but I would expect that the vast majority of conventions are the same for traces and logs. This includes http conventions, I don't see why logs need to have different conventions.

@lmolkova
Copy link
Contributor

Logs, metrics, and traces should share common attributes, but most of the attributes can only be applicable to logs given they're verbose, sometimes duplicate information, and require additional logic/resources to populate them.

We should either bring attributes which are not used for traces/metrics as log-only attributes (initially) or make them opt-in by default.

@trask
Copy link
Member Author

trask commented Apr 18, 2023

Some initial thoughts after discussing in HTTP semconv stability WG today:

  • url.fragment - this is missing today and we will add it HTTP semconv
  • url.password and url.username - we could add these to HTTP semconv as Opt-In attributes (maybe also as deprecated? since these url components have been deprecated)
  • url.extension - since this can be derived from url.path / url.full, we probably wouldn't want to capture it by default, but it could be added to HTTP semconv as an Opt-In attribute
  • url.domain and url.port - HTTP semconv already captures these in server.domain and server.port (BREAKING: Replace net.peer.*/net.host.* with client.*/server.* (and source.*/destination.*) #3402). We would like to avoid multiple ways to capture the same thing since to make life easier on backends and queries/alerting. Some options: pull these over when they are needed for another semantic convention other than HTTP, or pull them in now as Opt-In attributes for HTTP semconv (as duplicates of server.domain and server.port in the context of HTTP semconv)
  • url.registered_domain, url.subdomain, url.top_level_domain - since these can be derived from url.domain, we probably wouldn't want to capture them by default, but they could be added to HTTP semconv as Opt-In attributes
  • url.original - I'm not sure what to do with this attribute. We can already see having both url.full and url.original is confusing. And we would prefer to force http client instrumentation to construct url.full and http server instrumentation to split out url.path/url.query instead.

@AlexanderWert
Copy link
Member

@trask Thanks for creating this separate thread to discuss this!

I think the above really depends a lot on how we see the semantic conventions, i.e. in a very concrete context / scope (such as HTTP spans) or rather generic and signal agnostic. I know it's quite a fundamental question / topic, but I feel discussing it would help ECS alignment efforts going forward.

First of all, I think that one of the biggest values of the semantic conventions is the correlation of different data points and signals (especially with OTel, apart from tracing, expanding more and more into metrics, logs and other signals). So I fully agree with @tigrannajaryan 's comment, i.e. different signals (especially traces and logs) should share as many semantic conventions as possible (unless there are good reasons not to do, such as cardinality for metrics, etc.). Therefore, IMHO per default we should consider attributes and and their namespaces as signal agnostic (and rather do exceptions for attributes that should not apply to certain signals, such as metrics, or should apply only to a specific signal).

For example, in your comment above:

url.domain and url.port - HTTP semconv already captures these in server.domain and server.port (#3402). We would like to avoid multiple ways to capture the same thing since to make life easier on backends and queries/alerting. Some options: pull these over when they are needed for another semantic convention other than HTTP, or pull them in now as Opt-In attributes for HTTP semconv (as duplicates of server.domain and server.port in the context of HTTP semconv)

This is reasonable when considering just HTTP spans as the scope. But, in some logging and security use cases mixing up url.* and server.* attributes might be confusing or even wrong. As the namespace implies, all the url.* attributes basically describe in detail a URL string. While server.* fields describe the responder in a network connection. In case of a non-HTTP connection, url.* might be not applicable, while server.* is.

Happy to discuss this topic in more detail!

I like the proposal of using the Opt-In option when modeling redundancy, though!

@tigrannajaryan
Copy link
Member

We should either bring attributes which are not used for traces/metrics as log-only attributes (initially) or make them opt-in by default.

Yes, sure they should be opt-in, no reason to make these additional attributes required. I maintain that we should aim for uniformness of conventions for logs and traces (exceptions from this need an explanation). Traces don't have to use these attributes, they are just optional attributes that can be used if the user choses so.

@trask
Copy link
Member Author

trask commented Apr 19, 2023

we discussed in HTTP semconv WG meeting today:

  • there appears to be wide agreement on attributes being cross-signal
  • it may be useful to have an "attribute dictionary", from which different semantic conventions can pull attributes in
  • potentially all attributes in the dictionary could be considered "Opt-In" across any semantic convention, which would alleviate having to clutter HTTP semconv with a bunch of Opt-In url attributes (and probably similarly affect other semconv)

I'm removing HTTP semconv stability blocker tag from this issue (I'll open a separate issue for url.fragment and tag that as an HTTP semconv stability blocker #3355 (comment)).

@trask trask moved this from Blocker for stability to Can be addressed after stability in Spec: HTTP Semantic Conventions Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:trace Related to the specification/trace directory
Projects
Status: Can be addressed after stability
Development

No branches or pull requests

5 participants