`messaging.client_id` -> `messaging.client.id` rename causes issues with code generation #1031

dyladan · 2024-05-10T17:13:24Z

Area(s)

area:messaging

What happened?

Last week in #948 messaging.client_id was renamed to messaging.client.id. In the JS code generator, we use {{ attribute.fqn | to_const_name }} to generate variable names. This results in conflicting constants with the same name MESSAGING_CLIENT_ID. I'm not sure anything can be done about this, but wanted to raise it to the semconv group as this is the first time I've seen such a conflict.

Semantic convention version

main

Additional context

We're currently updating our semconv package. We continue to export deprecated attributes in order to make changes to the package non-breaking. The conflicting names get in the way of the code generator. I don't want to special-case the name for a single attribute if I can avoid it, and the "good" name is currently squatted on by the old deprecated attribute.

The text was updated successfully, but these errors were encountered:

dyladan · 2024-05-10T17:14:58Z

Looks like this change has not yet been released so there is still time to do something to avoid the collision. I would greatly appreciate it if it could be taken into consideration with the new name.

trask · 2024-05-10T20:43:20Z

it looks like this has happened before:

messaging.message_id -> messaging.message.id
messaging.kafka.message_key -> messaging.kafka.message.key
messaging.rocketmq.message_type -> messaging.rocketmq.message.type
messaging.rocketmq.message_tag -> messaging.rocketmq.message.tag
messaging.rocketmq.message_keys -> messaging.rocketmq.message.keys

but hasn't cause a codegen problem yet because the deprecated attributes were just completely dropped from the yaml files in these cases

dyladan · 2024-05-10T21:24:34Z

I'd be interested to see if this affects other language code generators

trask · 2024-05-10T22:54:20Z

Yeah I think it's likely cc @jack-berg

lquerel · 2024-05-21T15:17:15Z

It's interesting to note that from the perspective of the user of the generated semconvs, this scenario is ideal because it does not require changing references to these constants. Wouldn't one possible approach be to consider that in the case of such a conflict, only the non-experimental version is retained?

trask · 2024-05-21T15:26:33Z

Wouldn't one possible approach be to consider that in the case of such a conflict, only the non-experimental version is retained?

I agree this may be the best option

It's interesting to note that from the perspective of the user of the generated semconvs, this scenario is ideal because it does not require changing references to these constants.

the non-ideal part is that it will automatically change (some of) the emitted telemetry to a newer schema version while the instrumentation is still emitting an older schema version url

MrAlias · 2024-05-21T20:14:23Z

This looks to be affecting the Go code generation: https://github.com/open-telemetry/opentelemetry-go/actions/runs/9180409074/job/25244759934?pr=5394

jsuereth · 2024-05-21T21:33:45Z

Previously, when renaming of this variety we DROPPED the old attribute. Now we do not.

I see a few paths forward here:

We update uniqueness rules in semconv to be stricter - i.e. we know common codegen does NOT distinguish between . or _ so we update a name conflict policy to catch this ahead of time. This will avoid the conflict in the future, but still requires attention to the issue today.
We update codegen to use the more recent value (Stable overrides others, Experimental overrides Deprecated, etc.) This has issues to it as well.
We remove the old attribute. Effectively, we consider a rename of . to _ or vice-versa a non-breaking change. I don't think this is a valid option, just listing it so folks know we thought of it.
We update codegen constant naming to distinguish . and _ in some fashion, and require languages to make this distinction. This may be a viable LONG term direction if we make new codegen artifacts for semconv across languages, but this does not fix any short term issues with libraries that want to make stability guarantees.
We back out the change that broke codegen and re-introduce it once we have a plan for stable codegen.

In any case, I'll take the blame for not having more discussion of this issue prior to release. It's my opinion that the correct short (and longer term fix) will be on the codegen side. I shouldn't have forced that decision though.

brettmc · 2024-05-22T03:41:10Z

Also affecting PHP codegen, for the same reasons as others: {{ attribute.fqn | to_const_name }} ends up with the same const name.
For the time being, I can manually fix it by removing the deprecated one.

AlexanderWert · 2024-05-22T13:00:21Z

In SemConv we differentiate between . and _, in code generation usually not. So I'd say conceptually it's an issue with code gen that needs a cleaner long-term solution. IMHO, we should aim for Option 4 long term:

We update codegen constant naming to distinguish . and _ in some fashion, and require languages to make this distinction. This may be a viable LONG term direction if we make new codegen artifacts for semconv across languages, but this does not fix any short term issues with libraries that want to make stability guarantees.

I don't think Option 1 is a good solution for SemConv, as . and _ have clear, separate semantics (i.e. namespace separation vs. attribute name). By treating those as "the same thing" regarding uniqueness we would mix these semantics and make it even more confusiong.

So, for short term that would leave us (IMHO) with options 2. or 5.

trask · 2024-05-22T14:56:23Z

maybe:

messaging.client_id -> MESSAGING_CLIENTID
messaging.client.id -> MESSAGING_CLIENT_ID

it's not perfect, and we could still end up back here if there's a rename from messaging.clientid to messaging.client_id (or vice-versa), but it's probably a lot less likely than the more common renames that we have been doing:

messaging.message_id -> messaging.message.id

messaging.kafka.message_key -> messaging.kafka.message.key

messaging.rocketmq.message_type -> messaging.rocketmq.message.type

messaging.rocketmq.message_tag -> messaging.rocketmq.message.tag

messaging.rocketmq.message_keys -> messaging.rocketmq.message.keys

jsuereth · 2024-05-22T15:09:23Z

We had some discussion on this in the semconv tooling group. I think there's a few options for how to rename keys, particularly:

You can migrate . to _ and _ to __.
Go, today, erases to camelcase. We could think about preserving _ and having . turn into camel-case otherwise.

No matter the path forward, we're going to pull together a quick statistic of how many _ we have in semconv to better understand the potential issues/impact going forward for not disambiguating . and _ in codegen.

MadVikingGod · 2024-05-22T15:36:40Z

Follow-up from the SC tooling meeting: It was asked how many attributes currently have an underscore to measure the impact of how the above changes might effect generated code:

$ cd model/registry
$ yq '.groups[].attributes[].id' *.yaml -r  | grep '_' | sort | uniq | wc -l
121

In addition to a number of prefix's with _

MadVikingGod · 2024-05-22T15:41:30Z

I would also consider there are a number of attributes that might become a namespace of another attribute if we convert _ to .. For example we have attributes:

"container.command"
"container.command_args"
"container.command_line"
"http.request.method"
"http.request.method_original"

A blanket rewriting could make container.command both an attribute and a languages' namespace for container.command.args, and the same for http.request.method to http.request.method_original.

I think we might want to make a more flexible template, maybe one that lets the users of codegen specify how the normalization works best for their language.

dyladan · 2024-05-22T19:25:19Z

Wouldn't one possible approach be to consider that in the case of such a conflict, only the non-experimental version is retained?

I agree this may be the best option

That sounds nice but we're getting 2 generated constants that conflict with each other and cause compilation issues. We would have to then post-process the generated code to remove conflicts, which seems clunky at best. If the generator could handle these collisions on its own maybe it would be ok

It's interesting to note that from the perspective of the user of the generated semconvs, this scenario is ideal because it does not require changing references to these constants.

the non-ideal part is that it will automatically change (some of) the emitted telemetry to a newer schema version while the instrumentation is still emitting an older schema version url

I agree we don't want the telemetry to change out from under the user. Seems likely to result in telemetry where the telemetry doesn't match what its schema url claims.

We had some discussion on this in the semconv tooling group. I think there's a few options for how to rename keys, particularly:

You can migrate . to _ and _ to __.

Go, today, erases to camelcase. We could think about preserving _ and having . turn into camel-case otherwise.

Both of these example seem the opposite of what I would expect. __ seems like more separation than _ so I'd think . would become __ if anything, and I'm not sure I fully understand how the second one works. To me it seems like it would be better to CamelCase _ and turn . to _ if anything. My biggest issue there is that many languages already have casing conventions for constants so relying on case seems likely to cause issues there.

I was surprised to see 1.26 released with this known issue. Will there be a 1.26.1 to rectify it?

dyladan · 2024-05-22T19:27:28Z

Also affecting PHP codegen, for the same reasons as others: {{ attribute.fqn | to_const_name }} ends up with the same const name. For the time being, I can manually fix it by removing the deprecated one.

@brettmc keep in mind you're running into the situation mentioned above where users are going to have telemetry changed underneath them without realizing it. I'd caution against this.

jsuereth · 2024-05-22T19:35:28Z

1.26 was released assuming this is a codegen specific issue as we've made renames like this in the past. (see my apology above for making the decision, perhaps preemptively).

I still think this is an issue with codegen, but I'm asking the other semconv maintainers their opinion on backing off the change for now until a solution is found.

jsuereth · 2024-05-22T19:36:58Z

cc @open-telemetry/specs-semconv-maintainers

dyladan · 2024-05-22T19:45:23Z

I still think this is an issue with codegen, but I'm asking the other semconv maintainers their opinion on backing off the change for now until a solution is found.

It would be nice if this could be handled by codegen, but keep in mind that changing the way the codegen works is thorny for languages which already have released stable semconv packages. It means likely deprecating all old names and moving to the new style, which results in a lot of unneeded work in instrumentations to follow the new naming scheme.

lmolkova · 2024-05-22T20:14:10Z

It would be nice if this could be handled by codegen, but keep in mind that changing the way the codegen works is thorny for languages which already have released stable semconv packages. It means likely deprecating all old names and moving to the new style, which results in a lot of unneeded work in instrumentations to follow the new naming scheme.

Great point! It seems JavaScript is the only affected language. Given that it uses old tooling/templates and separates resource/other attributes into two different files, would it be fair to say that some breaking changes are inevitable there @dyladan ?

If so, this and other changes can be batched together and released as semconv v2 package.

Since (it seems) the cost of breaking is still low, I think we should disambiguate and make sure that different attribute are guaranteed to have different constant names.

The alternative I see is to tolerate the downside @trask brought up

the non-ideal part is that it will automatically change (some of) the emitted telemetry to a newer schema version while the instrumentation is still emitting an older schema version url

We should never rename a stable attribute and this would be a minor disturbances for experimental ones.

Still it might be surprising for users that their query no longer works even though the attribute constant name has not changed and I'd prefer to fix it if we can.

dyladan · 2024-05-22T20:26:57Z

It would be nice if this could be handled by codegen, but keep in mind that changing the way the codegen works is thorny for languages which already have released stable semconv packages. It means likely deprecating all old names and moving to the new style, which results in a lot of unneeded work in instrumentations to follow the new naming scheme.

Great point! It seems JavaScript is the only affected language. Given that it uses old tooling/templates and separates resource/other attributes into two different files, would it be fair to say that some breaking changes are inevitable there @dyladan ?

If so, this and other changes can be batched together and released as semconv v2 package.

This also affects PHP and Go at least. I suspect it also affects others. Separating resource/other attributes into separate files is an unrelated issue though. The issue is that we need both old and new names in order to handle the double-emit telemetry for the compatibility story.

JS is already planning to change how we generate semconv in the future (PR: open-telemetry/opentelemetry-js#4690). We're going to keep the old names around and mark them as deprecated, but the new names are causing this problem. See #1064 to see how we're generating the new names. I believe both the old and new generation scheme would have the same problems though.

Since (it seems) the cost of breaking is still low, I think we should disambiguate and make sure that different attribute are guaranteed to have different constant names.

The alternative I see is to tolerate the downside @trask brought up

the non-ideal part is that it will automatically change (some of) the emitted telemetry to a newer schema version while the instrumentation is still emitting an older schema version url

We should never rename a stable attribute and this would be a minor disturbances for experimental ones.

Still it might be surprising for users that their query no longer works even though the attribute constant name has not changed and I'd prefer to fix it if we can.

I'm not sure I agree that the cost of the break is "low" because the level of surprise would be quite high if we changed names out from under users without them making code changes.

My preferred fix would be to disallow any and all collisions, including with deprecated names, where non-alphanumeric characters are treated the same. For example messaging.client.id and messaging.client_id would be considered a disallowed collision.

MrAlias · 2024-05-22T20:36:40Z

Also affecting PHP codegen, for the same reasons as others: {{ attribute.fqn | to_const_name }} ends up with the same const name. For the time being, I can manually fix it by removing the deprecated one.

@brettmc keep in mind you're running into the situation mentioned above where users are going to have telemetry changed underneath them without realizing it. I'd caution against this.

In Go we release separate versions of semconv as separate packages. Dropping deprecated values would be acceptable for us in this situation given a user will need to explicitly make the upgrade by switching packages.

lmolkova · 2024-05-22T20:41:38Z

@dyladan If I understand your reply, we're talking about the same solution:

No collisions in generated code. messaging.client.id and messaging.client_id should have different constant names. This would prevent future collisions.
This would result in breaking changes to existing semconv libraries.
- Most of them are still not stable and can do it.
- JS plans some breaking changes and renaming HTTP_REQUEST_ORIGINAL_METHOD to HTTP_REQUEST_ORIGINAL__METHOD (and similar) could be done along with them.
- PHP would need to do breaking changes to semconv package too. (thank you for pointing it out)

marcalff · 2024-05-30T22:17:26Z

Opentelemetry-cpp was affected also:

[SEMANTIC CONVENTIONS] Upgrade to version 1.26.0 opentelemetry-cpp#2687

Generation for the old name was disabled in the template.

{#
  MAINTAINER:
  semconv "messaging.client_id" is deprecated
  semconv "messaging.client.id" is to be used instead
  Now, because we use k{{attribute.fqn | to_camelcase(True)}},
  both names collide on C++ symbol kMessagingClientId.
  Do not generate code for semconv "messaging.client_id"
#}

alxbl · 2024-05-31T18:13:33Z

Adding to this:

I started writing a codegen for C# and I'm running into the same issue. The only way around it given the information at the time of rendering the template is to disable rendering of any deprecated attributes to avoid name clashes.

lmolkova · 2024-05-31T19:49:39Z

@alxbl @marcalff

it should be possible to modify the function that generates constant name. It will affect existing attributes, other than messaging.client_id, but it will prevent future collisions. They are likely to happen.

The tooling will provide the proper function for it, so this would be a workaround.

What's important is to agree on the consistent formatting.

For languages that use camelCase or PascalCase it could probably be formatted as MessagingClient_Id (for messaging.client_id) and MessagingClientId (for messaging.client.id)

it can be achieved with existing tooling with a macro similar to

{%- macro to_const_name_v2(attr_name) -%}
{%- set ns=namespace(up=True) -%}
{%- for l in attr_name -%}
{%- if ns.up -%}
{{l | upper}}
{%- elif l != '.' -%}
{{l}}
{%- endif -%}
{%- set ns.up=(l=='.' or l=='_') -%}
{%- endfor -%}
{%- endmacro %}

In any case, please do share your thoughts on the format we should provide in tooling (whether MessagingClient_Id or HttpRequestMethod_Original) is reasonable or you'd prefer some other format that prevents collisions like this

marcalff · 2024-05-31T19:56:43Z

I think this needs more investigations.

For example, there are collisions between foo.barbaz and foobar.baz too.

lmolkova · 2024-05-31T21:05:47Z

For example, there are collisions between foo.barbaz and foobar.baz too.

They will result in different names: FooBarbaz and FoobarBaz. Which is not great, but not a collision in most languages.

The alternative is to do something like . -> _ and _ -> __. I.e. Messaging_Client_Id and Messaging_Client__Id (which seems to work better for languages that use snake_case or SCREAMING_SNAKE_CASE for constants)

lmolkova · 2024-06-02T21:26:18Z

Note this also affect class names

user_agent.* (User_AgentAttributes) should be distinguishable from useragent.* (UserAgentAttributes).
db.cassandra.consistency_level enum (DbCassandraConsistency_LevelValues) should be distinguishable from db.cassandra.consistencylevel (DbCassandraConsistencyLevelValues)

marcalff · 2024-06-03T07:22:05Z

For example, there are collisions between foo.barbaz and foobar.baz too.

They will result in different names: FooBarbaz and FoobarBaz. Which is not great, but not a collision in most languages.

The alternative is to do something like . -> _ and _ -> __. I.e. Messaging_Client_Id and Messaging_Client__Id (which seems to work better for languages that use snake_case or SCREAMING_SNAKE_CASE for constants)

Correct, I missed the fact that the . is implicitly represented by an uppercase in camel case.

So, to summarize:

Semantic conventions:

messaging.client_id
messaging.client.id

can be generated as:

kMessagingClient_Id
kMessagingClientId

or as:

MESSAGING_CLIENT__ID
MESSAGING_CLIENT_ID

depending of the language style (CamelCase, UPPERCASE).

This is a breaking change for every semantic convention that contains a _ character, but seems viable in the long term to prevent collisions.

The breaking change can not be avoided, by definition: the mapping for one of the colliding names has to change.

@lmolkova This solution will work for us (opentelemetry-cpp).

marcalff · 2024-06-03T13:08:21Z

@lmolkova

Assuming this is satisfactory for all SIG, could we have a new release of https://github.com/open-telemetry/build-tools, so that the primitives that convert names are adjusted (or new primitives are provided) ?

Then each SIG can use the fixed primitives to generate code that disambiguates collisions.

cc @open-telemetry/cpp-maintainers

alxbl · 2024-06-03T13:15:45Z

it can be achieved with existing tooling with a macro similar to [...code...]

Thanks! I will use this for the time being until this is fixed in build-tools.

I agree with @marcalff that the to_const_name or to_xyzCase methods should not replace _ with . before doing the conversion, that way we avoid the collisions and as described in his post.

lmolkova · 2024-06-03T19:29:12Z

Discussed at the SemConv and maintainers SIGs:

Options to resolve collisions are described in Code generation: how to avoid naming collisions #1118
- Please vote/share thoughts
- Active proposal being discussed is (option 2):
  - remove _ as in @trask proposal messaging.client_id -> messaging.client.id rename causes issues with code generation #1031 (comment): messaging.client_id -> MESSAGING_CLIENTID and MessagingClientid
  - disallow renames that only add/remove _.
Once there is a concensus on the resolution mechanism, we'll update the tooling (including build-tools)

lmolkova · 2024-06-17T16:51:27Z

The recommendation for messaging.client_id -> messaging.client.id would be to drop the old attribute.

Motivation:

it's experimental and deprecated
it's part of the messaging semconv that has the following warning (as a part of stabilization effort)

semantic-conventions/docs/messaging/messaging-spans.md

Lines 38 to 43 in cde003c

    
           > **Warning** 
        
           > Existing messaging instrumentations that are using 
        
           > [v1.24.0 of this document](https://github.com/open-telemetry/semantic-conventions/blob/v1.24.0/docs/messaging/messaging-spans.md) 
        
           > (or prior) SHOULD NOT change the version of the messaging conventions that they emit 
        
           > until a transition plan to the (future) stable semantic conventions has been published. 
        
           > Conventions include, but are not limited to, attributes, metric and span names, and unit of measure.

Example on how to implement configurable dropping in Jinja - https://github.com/crossoverJie/semantic-conventions-java/pull/1/files

See #1118 (comment) for discussion on the general issue (and steps we're taking to prevent future collisions).

lmolkova · 2024-09-23T18:44:27Z

closing this one: see #1031 (comment) for client_id specific guidance and #1118 (comment) for the future approach.

For the time being such changes are prohibited and guarded with a policy check in CI #1209

dyladan added bug Something isn't working triage:needs-triage labels May 10, 2024

github-actions bot assigned AlexanderWert May 10, 2024

github-actions bot added the area:messaging label May 10, 2024

MrAlias mentioned this issue May 21, 2024

Add semconv/v1.26.0 open-telemetry/opentelemetry-go#5394

Closed

1 task

brettmc mentioned this issue May 22, 2024

messaging.client.id duplicated when generating semconv 1.26.0 #1058

Closed

jsuereth added the tooling Regarding build, workflows, build-tools, ... label May 22, 2024

dyladan mentioned this issue May 24, 2024

codegen question: how to handle enum values as constants #1064

Open

lmolkova mentioned this issue May 24, 2024

Fix possible collisions when attribute is renamed open-telemetry/semantic-conventions-java#72

Closed

codeboten mentioned this issue May 29, 2024

Add semantic conventions version v1.26.0 open-telemetry/opentelemetry-collector#10249

Closed

lmolkova mentioned this issue May 30, 2024

[chore] Add blank issue template #1097

Closed

marcalff mentioned this issue May 30, 2024

[SEMANTIC CONVENTIONS] Upgrade to version 1.26.0 open-telemetry/opentelemetry-cpp#2687

Merged

3 tasks

lalitb mentioned this issue May 31, 2024

[SEMANTIC CONVENTIONS] Upgrade to version 1.26.0 open-telemetry/opentelemetry-rust#1851

Merged

4 tasks

marcalff mentioned this issue Jun 3, 2024

[RELEASE] Prepare release 1.16.0 open-telemetry/opentelemetry-cpp#2688

Closed

lmolkova mentioned this issue Jun 3, 2024

Code generation: how to avoid naming collisions #1118

Open

trask mentioned this issue Jun 6, 2024

Update to semantic-conventions 1.26.0 open-telemetry/semantic-conventions-java#73

Merged

This was referenced Jun 6, 2024

Semconv codegen should produce different constant names if attribute is renamed _ -> `` open-telemetry/semantic-conventions-java#75

Closed

opentelemetry-semantic-conventions: bump to v1.26.0 open-telemetry/opentelemetry-python#3964

Merged

iRevive mentioned this issue Jul 9, 2024

Update opentelemetry-semconv to 1.26.0-alpha typelevel/otel4s#706

Merged

joaopgrassi removed the triage:needs-triage label Jul 9, 2024

lmolkova mentioned this issue Jul 15, 2024

[feature] Build tools/schema should define code-friendly attribute/metric/event/etc names open-telemetry/build-tools#323

Closed

lmolkova mentioned this issue Aug 3, 2024

Run otel.io checks and codegen in any language as a prereq to release #1317

Open

lmolkova mentioned this issue Aug 28, 2024

Use weaver for semantic convention codegen open-telemetry/semantic-conventions-java#70

Merged

7 tasks

lmolkova closed this as completed Sep 23, 2024

lmolkova mentioned this issue Oct 9, 2024

Implement code-generation hints to drop/rename attributes in case of a collision #1462

Open

messaging.client_id -> messaging.client.id rename causes issues with code generation #1031

messaging.client_id -> messaging.client.id rename causes issues with code generation #1031

Comments

dyladan commented May 10, 2024

Area(s)

What happened?

Semantic convention version

Additional context

dyladan commented May 10, 2024

trask commented May 10, 2024

dyladan commented May 10, 2024

trask commented May 10, 2024

lquerel commented May 21, 2024

trask commented May 21, 2024

MrAlias commented May 21, 2024

jsuereth commented May 21, 2024

brettmc commented May 22, 2024

AlexanderWert commented May 22, 2024 • edited Loading

trask commented May 22, 2024

jsuereth commented May 22, 2024

MadVikingGod commented May 22, 2024

MadVikingGod commented May 22, 2024

dyladan commented May 22, 2024

dyladan commented May 22, 2024

jsuereth commented May 22, 2024

jsuereth commented May 22, 2024

dyladan commented May 22, 2024

lmolkova commented May 22, 2024

dyladan commented May 22, 2024

MrAlias commented May 22, 2024

lmolkova commented May 22, 2024 • edited Loading

marcalff commented May 30, 2024

alxbl commented May 31, 2024

lmolkova commented May 31, 2024 • edited Loading

marcalff commented May 31, 2024

lmolkova commented May 31, 2024 • edited Loading

lmolkova commented Jun 2, 2024 • edited Loading

marcalff commented Jun 3, 2024

marcalff commented Jun 3, 2024

alxbl commented Jun 3, 2024

lmolkova commented Jun 3, 2024 • edited Loading

lmolkova commented Jun 17, 2024

lmolkova commented Sep 23, 2024

`messaging.client_id` -> `messaging.client.id` rename causes issues with code generation #1031

`messaging.client_id` -> `messaging.client.id` rename causes issues with code generation #1031

AlexanderWert commented May 22, 2024 •

edited

Loading

lmolkova commented May 22, 2024 •

edited

Loading

lmolkova commented May 31, 2024 •

edited

Loading

lmolkova commented May 31, 2024 •

edited

Loading

lmolkova commented Jun 2, 2024 •

edited

Loading

lmolkova commented Jun 3, 2024 •

edited

Loading