Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt Elasticsearch spec to stability changes #1002

Merged
merged 22 commits into from
Jun 7, 2024

Conversation

gregkalapos
Copy link
Member

@gregkalapos gregkalapos commented May 3, 2024

Follow up from #974 (comment).

Changes

  • Align Elasticsearch span names with the format defined in the general db spec. MongoDB is comparable and it already follows this format. The idea is to not only have the operation (endpoint id) in the span name, but also the target.
  • Adding db.collection.name to the Elasticsearch spec, which is used in the span name as well.
  • Changed the last fallback in the span name from http.request.method to db.system. The thinking behind this is that db.system is usually the last fallback for other DBs. This was also the outcome in Better guidance on semantic conventions for database client call span names in case of missing information #704. In practice this fallback is used extremely rarely since endpoint id is usually available.

Questions

  • db.elasticsearch.cluster.name could be replaced with db.namespace. Do we want to do that?
  • I think we could consider making this stable at this point. Any thoughts on this? I'd prefer to try to make this stable.

Merge requirement checklist

@gregkalapos gregkalapos marked this pull request as ready for review May 3, 2024 14:51
@gregkalapos gregkalapos requested review from a team May 3, 2024 14:51
@gregkalapos gregkalapos changed the title Adapt Elasticsearch spec to stability changes [chore] Adapt Elasticsearch spec to stability changes May 3, 2024
@gregkalapos gregkalapos changed the title [chore] Adapt Elasticsearch spec to stability changes Adapt Elasticsearch spec to stability changes May 3, 2024
model/trace/database.yaml Outdated Show resolved Hide resolved
Copy link
Member

@jack-berg jack-berg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of minor comments.

docs/database/elasticsearch.md Outdated Show resolved Hide resolved
.chloggen/align_es_spec.yaml Outdated Show resolved Hide resolved
steverao and others added 2 commits May 29, 2024 19:08
Co-authored-by: Liudmila Molkova <limolkova@microsoft.com>
model/trace/database.yaml Outdated Show resolved Hide resolved
model/trace/database.yaml Outdated Show resolved Hide resolved
docs/database/elasticsearch.md Outdated Show resolved Hide resolved
docs/database/elasticsearch.md Outdated Show resolved Hide resolved
.chloggen/influxdb.yaml Outdated Show resolved Hide resolved
@lmolkova
Copy link
Contributor

lmolkova commented May 29, 2024

@gregkalapos
I was able to regenerate the registry and tables and pushed a commit with changes (hope you don't mind). So it works on my machine™️.

I think you have some problem with npm run fix:format in

attribute-registry-generation:
docker run --rm -v $(PWD)/model:/source -v $(PWD)/docs:/spec -v $(PWD)/templates:/weaver/templates \
$(WEAVER_CONTAINER) registry generate \
--registry=/source \
--templates=/weaver/templates \
markdown \
/spec/attributes-registry/
npm run fix:format

PTAL at the error logs - maybe you need to install npm or there is some version conflict. If you believe there is something we can do better on the makefile/tooling, please create an issue or send a PR to fix :)

gregkalapos and others added 2 commits May 29, 2024 19:29
Co-authored-by: Liudmila Molkova <limolkova@microsoft.com>
@gregkalapos
Copy link
Member Author

@gregkalapos I was able to regenerate the registry and tables and pushed a commit with changes (hope you don't mind). So it works on my machine™️.

I think you have some problem with npm run fix:format in

attribute-registry-generation:
docker run --rm -v $(PWD)/model:/source -v $(PWD)/docs:/spec -v $(PWD)/templates:/weaver/templates \
$(WEAVER_CONTAINER) registry generate \
--registry=/source \
--templates=/weaver/templates \
markdown \
/spec/attributes-registry/
npm run fix:format

PTAL at the error logs - maybe you need to install npm or there is some version conflict. If you believe there is something we can do better on the makefile/tooling, please create an issue or send a PR to fix :)

Great, thanks for looking into this. Installing Node.js solves the issue (just reinstalled my machine... )

docs/database/elasticsearch.md Outdated Show resolved Hide resolved
model/registry/deprecated/db.yaml Outdated Show resolved Hide resolved
model/trace/database.yaml Outdated Show resolved Hide resolved
model/trace/database.yaml Outdated Show resolved Hide resolved
model/trace/database.yaml Outdated Show resolved Hide resolved
@swallez
Copy link

swallez commented Jun 4, 2024

About the new db.collection.name attribute: the Elasticsearch API generally accepts a list of names and wildcards for index names, e.g. not only foo, but also foo,bar,baz-*. Will that be an issue? Expanding wildcards on the client side is not really an option because of the additional request it would require to have the list of matching indices (which can also be large).

@trask
Copy link
Member

trask commented Jun 5, 2024

About the new db.collection.name attribute: the Elasticsearch API generally accepts a list of names and wildcards for index names, e.g. not only foo, but also foo,bar,baz-*. Will that be an issue? Expanding wildcards on the client side is not really an option because of the additional request it would require to have the list of matching indices (which can also be large).

my interpretation of the spec is that db.collection.name should capture the first index name

if the first index name has a wildcard, I think that's ok, and db.collection.name would include the wildcard, e.g. foo*

@gregkalapos
Copy link
Member Author

Thanks for raising this @swallez .

About the new db.collection.name attribute: the Elasticsearch API generally accepts a list of names and wildcards for index names, e.g. not only foo, but also foo,bar,baz-*. Will that be an issue? Expanding wildcards on the client side is not really an option because of the additional request it would require to have the list of matching indices (which can also be large).

my interpretation of the spec is that db.collection.name should capture the first index name

Correct, so the example of foo,bar,baz-* is fine - it's just foo in that case.

if the first index name has a wildcard, I think that's ok, and db.collection.name would include the wildcard, e.g. foo*

That'd be my suggestion as well. Should we call this out explicitly? Happy to push a change.

@swallez
Copy link

swallez commented Jun 5, 2024

my interpretation of the spec is that db.collection.name should capture the first index name

The footnote for db.collection.name says: "If the collection name is parsed from the query text, it SHOULD be the first collection name found in the query and it SHOULD match the value provided in the query text including any schema and database name prefix"

In the context of Elasticsearch, the first collection name is found in the URL path and is not extracted from the query text. It defines the target of the operation. In this regard it is very similar to the target table in update FOO where... in SQL.

If we only take the first part of a multi-index target, we're loosing important information. For example, if we run a delete by query that targets foo,bar but only keep foo in the span attribute, there is no way for a user to find that trace to understand why something was deleted from bar.

So IMHO we should keep the index target unmodified in the span attribute, even if it's actually a list:

  • this is the value users put in their queries,
  • truncating it looses information that can be essential to find the request that changed something in an index.

@gregkalapos
Copy link
Member Author

@swallez there is also db.elasticsearch.path_parts.<key> with definition:

A dynamic value in the url path.

and examples:

db.elasticsearch.path_parts.index=test-index; db.elasticsearch.path_parts.doc_id=123

So in your example, I'd expect db.elasticsearch.path_parts.index=foo,bar - so users would know from db.elasticsearch.path_parts.index why the span deleted from bar.

So this is only for db.collection.name and for the span name.

Having said that, we already had an iteration of the comma separated vs. only keeping the first one question here: #1002 (comment)

I agree with you, this is a bit different since this is coming from the request url. @swallez what do you think: does db.elasticsearch.path_parts.index address this, or you still think we should revert back and add the comma separated list into db.collection.name, even though other database semantic conventions only take the 1. one?

@swallez
Copy link

swallez commented Jun 5, 2024

First, asking users to look at db.elasticsearch.path_parts.index for multi-index requests because db.collection.name is incomplete in that case would be a pretty bad experience IMHO.

Also, there are some deviations from using index as the path parameter: operations on data streams use name and operations on aliases use alias. Data streams and aliases can be considered as collections, as many operation equally accept indices, data streams and aliases.

I really think we should consider multi-index targets as a kind of runtime alias (or an SQL partitioned table) as it is semantically equivalent, and all target indices are expected to have the same schema. This is very different from comma separated table names in a SQL query like select * from FOO, BAR where... where there is a clear distinction between the left and right parts of a join.

So IMHO keeping the index parameter as is is semantically consistent with the spec, since it defines what the target of the operation is, which - in the case of wildcards or comma-separated names - is equivalent to a runtime-defined alias.

@gregkalapos
Copy link
Member Author

Also, there are some deviations from using index as the path parameter: operations on data streams use name and operations on aliases use alias. Data streams and aliases can be considered as collections, as many operation equally accept indices, data streams and aliases.

Yeah, that indeed makes this too complicated to figure out for users.

I pushed an update to the PR now stating that it should be a comma separated list.

At the same time I also opened #1132 to discuss this in general. We had a conversation about this in the working group and we thought it'd be useful to reiterate the decision that only the 1. collection name should be captured. Regardless of the outcome of that issue, my current proposal would be to move on with this PR and explicitly stating here that collection name should be a comma separated list of all targets.

@lmolkova lmolkova merged commit 315a717 into open-telemetry:main Jun 7, 2024
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

8 participants