Adapt Elasticsearch spec to stability changes #1002

gregkalapos · 2024-05-03T14:45:14Z

Follow up from #974 (comment).

Changes

Align Elasticsearch span names with the format defined in the general db spec. MongoDB is comparable and it already follows this format. The idea is to not only have the operation (endpoint id) in the span name, but also the target.
Adding db.collection.name to the Elasticsearch spec, which is used in the span name as well.
Changed the last fallback in the span name from http.request.method to db.system. The thinking behind this is that db.system is usually the last fallback for other DBs. This was also the outcome in Better guidance on semantic conventions for database client call span names in case of missing information #704. In practice this fallback is used extremely rarely since endpoint id is usually available.

Questions

db.elasticsearch.cluster.name could be replaced with db.namespace. Do we want to do that?
I think we could consider making this stable at this point. Any thoughts on this? I'd prefer to try to make this stable.

Merge requirement checklist

CONTRIBUTING.md guidelines followed.
Change log entry added, according to the guidelines in When to add a changelog entry.
- If your PR does not need a change log, start the PR title with [chore]
schema-next.yaml updated with changes to existing conventions.

model/trace/database.yaml

docs/database/elasticsearch.md

jack-berg

Just a couple of minor comments.

docs/database/elasticsearch.md

.chloggen/align_es_spec.yaml

model/registry/deprecated/db.yaml

Co-authored-by: Liudmila Molkova <limolkova@microsoft.com>

model/trace/database.yaml

docs/database/elasticsearch.md

.chloggen/influxdb.yaml

lmolkova · 2024-05-29T17:28:43Z

@gregkalapos
I was able to regenerate the registry and tables and pushed a commit with changes (hope you don't mind). So it works on my machine™️.

I think you have some problem with npm run fix:format in

semantic-conventions/Makefile

Lines 119 to 126 in 0dfe9fb

    
           attribute-registry-generation: 
        
           	docker run --rm -v $(PWD)/model:/source -v $(PWD)/docs:/spec -v $(PWD)/templates:/weaver/templates \ 
        
           		$(WEAVER_CONTAINER) registry generate \ 
        
           		  --registry=/source \ 
        
           		  --templates=/weaver/templates \ 
        
           		  markdown \ 
        
           		  /spec/attributes-registry/ 
        
           	npm run fix:format

PTAL at the error logs - maybe you need to install npm or there is some version conflict. If you believe there is something we can do better on the makefile/tooling, please create an issue or send a PR to fix :)

Co-authored-by: Liudmila Molkova <limolkova@microsoft.com>

gregkalapos · 2024-05-29T17:44:23Z

@gregkalapos I was able to regenerate the registry and tables and pushed a commit with changes (hope you don't mind). So it works on my machine™️.

I think you have some problem with npm run fix:format in

semantic-conventions/Makefile

Lines 119 to 126 in 0dfe9fb

attribute-registry-generation:

docker run --rm -v $(PWD)/model:/source -v $(PWD)/docs:/spec -v $(PWD)/templates:/weaver/templates \

$(WEAVER_CONTAINER) registry generate \

--registry=/source \

--templates=/weaver/templates \

markdown \

/spec/attributes-registry/

npm run fix:format

PTAL at the error logs - maybe you need to install npm or there is some version conflict. If you believe there is something we can do better on the makefile/tooling, please create an issue or send a PR to fix :)

Great, thanks for looking into this. Installing Node.js solves the issue (just reinstalled my machine... )

docs/database/elasticsearch.md

model/trace/database.yaml

docs/database/elasticsearch.md

model/registry/deprecated/db.yaml

model/trace/database.yaml

swallez · 2024-06-04T09:48:26Z

About the new db.collection.name attribute: the Elasticsearch API generally accepts a list of names and wildcards for index names, e.g. not only foo, but also foo,bar,baz-*. Will that be an issue? Expanding wildcards on the client side is not really an option because of the additional request it would require to have the list of matching indices (which can also be large).

trask · 2024-06-05T14:27:23Z

About the new db.collection.name attribute: the Elasticsearch API generally accepts a list of names and wildcards for index names, e.g. not only foo, but also foo,bar,baz-*. Will that be an issue? Expanding wildcards on the client side is not really an option because of the additional request it would require to have the list of matching indices (which can also be large).

my interpretation of the spec is that db.collection.name should capture the first index name

if the first index name has a wildcard, I think that's ok, and db.collection.name would include the wildcard, e.g. foo*

…ic-conventions into align_es_spec

gregkalapos · 2024-06-05T14:32:06Z

Thanks for raising this @swallez .

About the new db.collection.name attribute: the Elasticsearch API generally accepts a list of names and wildcards for index names, e.g. not only foo, but also foo,bar,baz-*. Will that be an issue? Expanding wildcards on the client side is not really an option because of the additional request it would require to have the list of matching indices (which can also be large).

my interpretation of the spec is that db.collection.name should capture the first index name

Correct, so the example of foo,bar,baz-* is fine - it's just foo in that case.

if the first index name has a wildcard, I think that's ok, and db.collection.name would include the wildcard, e.g. foo*

That'd be my suggestion as well. Should we call this out explicitly? Happy to push a change.

swallez · 2024-06-05T16:05:03Z

my interpretation of the spec is that db.collection.name should capture the first index name

The footnote for db.collection.name says: "If the collection name is parsed from the query text, it SHOULD be the first collection name found in the query and it SHOULD match the value provided in the query text including any schema and database name prefix"

In the context of Elasticsearch, the first collection name is found in the URL path and is not extracted from the query text. It defines the target of the operation. In this regard it is very similar to the target table in update FOO where... in SQL.

If we only take the first part of a multi-index target, we're loosing important information. For example, if we run a delete by query that targets foo,bar but only keep foo in the span attribute, there is no way for a user to find that trace to understand why something was deleted from bar.

So IMHO we should keep the index target unmodified in the span attribute, even if it's actually a list:

this is the value users put in their queries,
truncating it looses information that can be essential to find the request that changed something in an index.

gregkalapos · 2024-06-05T16:17:49Z

@swallez there is also db.elasticsearch.path_parts.<key> with definition:

A dynamic value in the url path.

and examples:

db.elasticsearch.path_parts.index=test-index; db.elasticsearch.path_parts.doc_id=123

So in your example, I'd expect db.elasticsearch.path_parts.index=foo,bar - so users would know from db.elasticsearch.path_parts.index why the span deleted from bar.

So this is only for db.collection.name and for the span name.

Having said that, we already had an iteration of the comma separated vs. only keeping the first one question here: #1002 (comment)

I agree with you, this is a bit different since this is coming from the request url. @swallez what do you think: does db.elasticsearch.path_parts.index address this, or you still think we should revert back and add the comma separated list into db.collection.name, even though other database semantic conventions only take the 1. one?

swallez · 2024-06-05T17:01:57Z

First, asking users to look at db.elasticsearch.path_parts.index for multi-index requests because db.collection.name is incomplete in that case would be a pretty bad experience IMHO.

Also, there are some deviations from using index as the path parameter: operations on data streams use name and operations on aliases use alias. Data streams and aliases can be considered as collections, as many operation equally accept indices, data streams and aliases.

I really think we should consider multi-index targets as a kind of runtime alias (or an SQL partitioned table) as it is semantically equivalent, and all target indices are expected to have the same schema. This is very different from comma separated table names in a SQL query like select * from FOO, BAR where... where there is a clear distinction between the left and right parts of a join.

So IMHO keeping the index parameter as is is semantically consistent with the spec, since it defines what the target of the operation is, which - in the case of wildcards or comma-separated names - is equivalent to a runtime-defined alias.

docs/database/elasticsearch.md

gregkalapos · 2024-06-07T14:46:29Z

Also, there are some deviations from using index as the path parameter: operations on data streams use name and operations on aliases use alias. Data streams and aliases can be considered as collections, as many operation equally accept indices, data streams and aliases.

Yeah, that indeed makes this too complicated to figure out for users.

I pushed an update to the PR now stating that it should be a comma separated list.

At the same time I also opened #1132 to discuss this in general. We had a conversation about this in the working group and we thought it'd be useful to reiterate the decision that only the 1. collection name should be captured. Regardless of the outcome of that issue, my current proposal would be to move on with this PR and explicitly stating here that collection name should be a comma separated list of all targets.

model/trace/database.yaml

gregkalapos marked this pull request as ready for review May 3, 2024 14:51

gregkalapos requested review from a team May 3, 2024 14:51

github-actions bot assigned jsuereth May 3, 2024

gregkalapos changed the title ~~Adapt Elasticsearch spec to stability changes~~ [chore] Adapt Elasticsearch spec to stability changes May 3, 2024

gregkalapos changed the title ~~[chore] Adapt Elasticsearch spec to stability changes~~ Adapt Elasticsearch spec to stability changes May 3, 2024

lmolkova reviewed May 3, 2024

View reviewed changes

model/trace/database.yaml Outdated Show resolved Hide resolved

lmolkova reviewed May 3, 2024

View reviewed changes

docs/database/elasticsearch.md Outdated Show resolved Hide resolved

trask reviewed May 7, 2024

View reviewed changes

docs/database/elasticsearch.md Outdated Show resolved Hide resolved

gregkalapos added 3 commits May 29, 2024 16:28

Adapt es spec to stability changes

c2645b6

Create align_es_spec.yaml

3569779

Elasticsearch: adapt span name and use db.namespace

0c70118

gregkalapos force-pushed the align_es_spec branch from a76c5ed to 0c70118 Compare May 29, 2024 15:46

Update elasticsearch.md

7c1dcd7

jack-berg approved these changes May 29, 2024

View reviewed changes

docs/database/elasticsearch.md Outdated Show resolved Hide resolved

.chloggen/align_es_spec.yaml Outdated Show resolved Hide resolved

gregkalapos added 2 commits May 29, 2024 18:13

Add info about endpoint id

7cb1e3c

Update db.md

238e2cd

trask reviewed May 29, 2024

View reviewed changes

model/registry/deprecated/db.yaml Outdated Show resolved Hide resolved

steverao and others added 2 commits May 29, 2024 19:08

Add semantic convention of influxDB (open-telemetry#949)

b672e5b

Co-authored-by: Liudmila Molkova <limolkova@microsoft.com>

Update db.yaml

d823281

gregkalapos force-pushed the align_es_spec branch from 9d42877 to d823281 Compare May 29, 2024 17:09

lmolkova reviewed May 29, 2024

View reviewed changes

gregkalapos and others added 2 commits May 29, 2024 19:14

Merge branch 'main' into align_es_spec

ee0fbbf

regenerate tables

0dfe9fb

gregkalapos and others added 2 commits May 29, 2024 19:29

Apply suggestions from code review

e8e996b

Co-authored-by: Liudmila Molkova <limolkova@microsoft.com>

Review feedback.

baf10b3

trask reviewed May 29, 2024

View reviewed changes

docs/database/elasticsearch.md Show resolved Hide resolved

gregkalapos commented May 31, 2024

View reviewed changes

model/trace/database.yaml Outdated Show resolved Hide resolved

trask approved these changes May 31, 2024

View reviewed changes

lmolkova reviewed May 31, 2024

View reviewed changes

docs/database/elasticsearch.md Outdated Show resolved Hide resolved

model/registry/deprecated/db.yaml Outdated Show resolved Hide resolved

model/trace/database.yaml Outdated Show resolved Hide resolved

model/trace/database.yaml Outdated Show resolved Hide resolved

gregkalapos added 3 commits June 3, 2024 18:38

Merge branch 'main' into align_es_spec

fd6708a

Review feedback

e1dc644

Update db.md

50f38e0

lmolkova approved these changes Jun 3, 2024

View reviewed changes

model/trace/database.yaml Outdated Show resolved Hide resolved

Merge branch 'main' into align_es_spec

b8770d2

gregkalapos added 2 commits June 5, 2024 16:28

Move Elastic Cloud note

8140f0e

Merge branch 'align_es_spec' of https://github.com/gregkalapos/semant…

85e7ed0

…ic-conventions into align_es_spec

gregkalapos mentioned this pull request Jun 7, 2024

db.collection.name - reconsider only capturing the first collection name #1132

Closed

2 tasks

Update database.yaml

1df6af3

trask reviewed Jun 7, 2024

View reviewed changes

docs/database/elasticsearch.md Show resolved Hide resolved

gregkalapos added 2 commits June 7, 2024 16:47

Update elasticsearch.md

841230e

Update schema-next.yaml

8d75073

trask reviewed Jun 7, 2024

View reviewed changes

model/trace/database.yaml Show resolved Hide resolved

trask approved these changes Jun 7, 2024

View reviewed changes

Merge branch 'main' into align_es_spec

17306b9

lmolkova merged commit 315a717 into open-telemetry:main Jun 7, 2024
11 of 12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapt Elasticsearch spec to stability changes #1002

Adapt Elasticsearch spec to stability changes #1002

gregkalapos commented May 3, 2024 •

edited

Loading

jack-berg left a comment

lmolkova commented May 29, 2024 •

edited

Loading

gregkalapos commented May 29, 2024

swallez commented Jun 4, 2024

trask commented Jun 5, 2024

gregkalapos commented Jun 5, 2024

swallez commented Jun 5, 2024

gregkalapos commented Jun 5, 2024

swallez commented Jun 5, 2024

gregkalapos commented Jun 7, 2024

Adapt Elasticsearch spec to stability changes #1002

Adapt Elasticsearch spec to stability changes #1002

Conversation

gregkalapos commented May 3, 2024 • edited Loading

Changes

Questions

Merge requirement checklist

jack-berg left a comment

Choose a reason for hiding this comment

lmolkova commented May 29, 2024 • edited Loading

gregkalapos commented May 29, 2024

swallez commented Jun 4, 2024

trask commented Jun 5, 2024

gregkalapos commented Jun 5, 2024

swallez commented Jun 5, 2024

gregkalapos commented Jun 5, 2024

swallez commented Jun 5, 2024

gregkalapos commented Jun 7, 2024

gregkalapos commented May 3, 2024 •

edited

Loading

lmolkova commented May 29, 2024 •

edited

Loading