-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[resource/host] Add semantic convention for IP addresses of a host #203
Conversation
Do we need the explicit differentiation of Also, in ECS we have the cc @ChrsMark |
We discussed this in the system semantic convention WG as well as in #131 (comment). Personally the main benefit I see is that we make semantics of the attributes more specific; I have been bit by very general attributes like Discussed #131 (comment)
I find it at least possible that |
The v4/v6 name split is also inconsistent with our network.* semantic conventions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to call out also on the PR that AFAIK we intentionally left out these attributes because we want to delay defining them until after we sort out how to implement mutable/added-after-start resource attributes: open-telemetry/opentelemetry-specification#1298
Personally I think given the long dormancy of that issue and the usefulness of capturing IPs, it may be OK to ignore that. But maybe this would also be a good opportunity to get the discussion there going again?
Would you mind elaborating on which conventions are inconsistent with these and how?
I think this issue is important but I don't see how this relates to these attributes specifically. The main way we would add these today would be through an OpenTelemetry Collector processor (there are several processors that can modify the resource of telemetry passing through the Collector), from a brief reading of the issue I don't think the issue covers this case but is rather talking about SDKs |
One benefit of being "ambiguous" here is that querying and interoperating with the data is easier, i.e. does not require knowledge about the IP version. For example, let's say you want to correlate data by an IP. With the above proposal, before querying data you'd need to parse the value of an IP first to understand which version it is, before being able to query on the right attribute ( Attributes with comparable, intentional ambiguity are
Interesting, haven't thought about that. Do you have examples of additional attributes that we'd like to have under an |
@mx-psi See here: https://github.com/open-telemetry/semantic-conventions/blob/main/docs/general/attributes.md#network-attributes Especially (just now while listing these, I stumbled upon / created #206) |
@AlexanderWert (also @Oberon00) My main goal was to ensure that if someone wants to determine whether an address is IPv4/IPv6 they can unambiguously recover that information. I think Alex's is a fair point and prior attributes are also a good argument against splitting per version. A compromise would be to have a single attribute but provide strict guidance on what the IP format should be for each version (current wording includes references to "quad-dotted notation" and to RFC 5952, that seems enough to me) so that one can still recover the IP version. Would that be okay with you?
I don't have anything particular in mind (maybe someday we want to express CIDR blocks? or something about IP headers?), but I wouldn't want my lack of imagination to put us in the future in a spot where we have to contradict the spec guidance when we can easily avoid it. |
Hey @mx-psi, I'm +1 for the one single attribute ( |
I merged both attributes into Do we have any guidance regarding plurals in attribute names? For example, we have |
Hi Folks,
+1 In general, it’s best for existing users of ECS if we maintain compatibility (field name equivalence) with widely deployed ECS fields, like
I hope we can avoid all situations where we might add an extension (like
ECS provides a set of guidelines for field names, which includes the rather obvious “Use singular and plural names properly to reflect the field content. For example, use If we anticipate that some systems adopting Otel SemConv will not support a combined IPv4/IPv6 field data type, then we could entertain proposals to add new fields, such as |
/easycla |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I have filed open-telemetry/opentelemetry-specification#3641 to further discuss and clarify the pluralization guidelines.
I generally also agree that having attributes separated by version would hurt how one search for it - you would need to either know which version is being used OR do a query with both and see which one pop up with data so having a single seems better to me. What I'm not so sure is using a singular term for an array, since that goes against our recommendation. Adding an exception just opens up for more interpretations to "what is wildly known" (https://github.com/open-telemetry/opentelemetry-specification/pull/3641/files). In another hand, we said we would take ECS fields and using something other than |
If we compromise ECS, I would favor |
Sorry for the delay, catching up from some time off
If our focus is on the success of analysts interacting with our data, which I think it is, then we should favor exceptions for widely-known terms over strict-adherence to rules like all array type field names must use plural form. While strict adherence can have some advantages when it comes to automated validation of fields in a schema, in this case, since English language supports irregular plurals, even a simple automated check such as “if type = array, check name ends in ’s’” will require an exception path for irregular plurals, for example, child-children, mouse-mice, radius-radii, etc., so if we’d need to have exceptions for automated checks anyway, it makes more sense to apply the exception in favor of the consumer of our data.
From the Elastic side, we are strongly in favor of making the exception to the plural attributes naming convention, and avoiding the breaking change. We think that current and future users will appreciate the consistency.
If we can’t agree with the above recommendation to make the exception to the plural attributes naming convention, and we feel we must make a breaking change for the thousands of users already using ECS
|
I want to discuss this on next week's Specification SIG meeting, the main point of contention seems to be if/how much we should deviate from ECS, and I think a general framework like what is discussed on #181 (comment) would help move this forward. Please attend if you want/can so we can move this foward :) |
So I see 3 options here: A) we use Pros/Cons per optionA)
|
I don't have a horse in this race, but based on the pros/cons, I it seems logical to suggest: D)
|
Thank's @Oberon00 for chiming in!
That one sounds good to me! Looks like a hybrid 5th option here that covers 2 of the 3 main trade-offs we are discussing: ++
--
What do the rest of @open-telemetry/semconv-system-approvers think here? |
I haven't publicly thanked @ChrsMark for the summary so thanks again! :) and I just want to state that I also feel like |
@open-telemetry/specs-semconv-approvers @open-telemetry/specs-semconv-maintainers We need more reviews on this, given the discussions about ECS and etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Damn I guess I am a bit late.
I still think host.ip.addresses would be slightly better in terms of future-proofing but I understand conciseness is also a factor here. I left the wording regarding formatting of IPv4 and IPv6 addresses.
I hope we can avoid all situations where we might add an extension (like
.addresses
) to an existing ECS field. This provides two benefits to our users 1) a consistent practice where a “leaf” field name is always a leaf field name. (e.g.,host.ip
,source.ip
,client.ip
,destination.ip
, etc.). - this helps analysts remember and/or deduce field names. 2) It avoids potential field data type mismatches in Elasticsearch where a given field name cannot be an object and a “leaf” field in the same index. During the development of ECS, we made one such error, makinghost
an object (i.e., the leading part ofhost.ip
), when previously Logstash usedhost
as a “leaf” field name. We had to deal with reconciling these mismatches for years in mixed user environments.
While I see this point, I am a bit confused. Isnt the current spec promoting fields like client.address
or server.address
- I noticed this from the http space open-telemetry/opentelemetry-specification#3163? In that case its clear that we expect a single address. And here / in ECS e.g. destination and server provide an address
field. I assume that a spec consistent with its recommendations would also help e.g. analyst too.
Thanks @frzifus, please let me try to clarify, my point above is not objecting generally to the field names |
**Description:** Adds detection for `host.ip`. Relates to open-telemetry/semantic-conventions/pull/203 Fixes #24765
Addshost.ipv4.addresses
andhost.ipv6.addresses
resource semantic conventions to specify the IPv4 and IPv6 addresses of a host.Adds
host.ip
resource attribute for specifying the IPs of a host, ensuring well-recognized formats for IPv4 and IPv6 addressesUpdates #131 (missing
host.mac
)Reference implementation on
system
detector: open-telemetry/opentelemetry-collector-contrib/pull/24450