Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database general-purpose target attribute #521

Open
alanwest opened this issue Nov 14, 2023 · 13 comments
Open

Database general-purpose target attribute #521

alanwest opened this issue Nov 14, 2023 · 13 comments
Labels

Comments

@alanwest
Copy link
Member

alanwest commented Nov 14, 2023

Proposal for new db.target attribute

The type of targets available for a database operation is dependent on a database system. For example, the target of a SQL database operation may be a table, function, or stored procedure. Other database systems may refer to their targets as collections or indexes, etc.

Currently, there exists database system specific attributes for describing this. Examples include:

Introducing a general-purpose attribute would enable backends to provide a generic experience for faceting database operations by their target without requiring the backend have knowledge of all the database system specific attributes. It would also enable end users to more easily explore their data at a high-level without having to know the specific database systems in use.

I propose a new call-level attribute db.target for describing the target of a database operation. The term "target" may not be the greatest, but I'm using it for lack of a better term for the moment. I considered db.operation.target but of course db.operation is already an attribute. Maybe db.operation could become db.operation.type thereby allowing db.operation.target.

Issues to consider

  1. Similar to db.operation, the value of the db.target may not be readily available without parsing the value of db.statement, e.g. SELECT * FROM Users the db.operation is SELECT and the db.target is Users. The specification advises against parsing db.statement to derive other attributes. I consider this a limitation particularly once we have a database client operation metric (see Database client operation duration metric #512). The db.statement attribute is fine for spans but because of its high-cardinality will not work for metrics. Therefore, having db.operation and db.target attributes on metrics will be useful.
  2. There may be many targets for a single database operation, e.g. SELECT * FROM Users U JOIN Company C ON ....

Perhaps parsing db.statement can be an opt-in feature?

Various backends have prior art for addressing these issues. For example:

  • Elastic APM performs a limited parsing to extract the operation and outer-most table name from a SQL statement. This is described in their agent specifications.
  • New Relic APM agents perform a similar limited parsing and includes operation/target on metric data. New Relic's agent specs are unfortunately not publicly available, so I do not have a link to share. Though here is New Relic Java agent source code that performs the parsing.
  • New Relic agents offer a feature for providing a "hint" of the database target. The hint can be added as a comment prior to the database statement to simplify parsing, e.g., /* nrhint:Company */ SELECT * FROM Users U JOIN Company C ON ....
@pyohannes
Copy link
Contributor

In the messaging space we have a similar scenario, there we settled on the more or less generic term destination, which covers both queues and topics.

We also ended up making messaging.destination a namespace. There's messaging.destination.name, and then there are further attributes to describe the destination. Maybe something similar makes sense if one decides to go with the db.target approach.

@mateuszrzeszutek
Copy link
Member

Perhaps parsing db.statement can be an opt-in feature?

FWIW the javaagent already does that. We have a query sanitization feature that replaces all the query variables with ?. It uses a performant tokenizer that scans the query looking for the interesting parts: https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation-api-semconv/src/main/jflex/SqlSanitizer.jflex
Since we're already iterating over the query once, we figured we might as well try to extract the table name while we're at it.

@jcocchi
Copy link
Contributor

jcocchi commented Feb 9, 2024

Currently, there exists database system specific attributes for describing this. Examples include:

Cosmos DB also has a custom attribute for this called db.cosmosdb.container. Standardizing on a term for the semantic convention makes sense to me.

Perhaps parsing db.statement can be an opt-in feature?

Wouldn't the source library be responsible for populating the db.target, so each library would presumably only need to know how to parse its own db.statement (vs. the APM tool doing it)? Cosmos DB currently doesn't populate db.statement at all, and it doesn't apply to most of our operations.

@dicko2
Copy link

dicko2 commented Feb 11, 2024

This is one of our biggest barriers from moving from app insights to otel. App insights includes the whole statement, we use mainly stored procs though, and where we don't its parameterised query strings. It would cause cardinality issues if people are injecting values into the query. But I think people shouldn't be doing this anyway.

I don't see the value of the target begin the table alone. Think about what you need to debug a problem. If you put only a table then these two queries will be grouped together

SELECT * FROM users

SELECT name FROM users WHERE userid = @userid

They​ will have drastically different performance right? So why group them?

I think adding the whole query is the way to go imo.

@trask
Copy link
Member

trask commented Feb 13, 2024

It would cause cardinality issues if people are injecting values into the query. But I think people shouldn't be doing this anyway.

I believe even properly parameterized statements are too high-cardinality for metrics, e.g. it's not uncommon to have 100's of tables

@dicko2
Copy link

dicko2 commented Feb 14, 2024

If you don't have the query. I would question the use of the metric. What are you measuring?

@trask
Copy link
Member

trask commented Feb 16, 2024

Naming idea from semconv meeting: db.entity

Related: "entity" term may be used for identifying resource attributes

@jack-berg
Copy link
Member

jack-berg commented Feb 16, 2024

Compiling the query syntax docs from a sampling of the different db systems we reference:

db.target seems like a decent choice.

I also like db.collection, since it implies a collections of rows, columns, records, documents, entries in an index, etc.

@dicko2
Copy link

dicko2 commented Feb 18, 2024

You could also be calling a stored procedure too

@joaopgrassi
Copy link
Member

joaopgrassi commented Feb 20, 2024

If you don't have the query. I would question the use of the metric. What are you measuring?

One could measure all db.operation in a certain db.target - say "Give me a count of all SELECT in target MYTABLE. Or "Give me a count of all EXEC in target MY_SPROC". If that's useful, that's another question.

The statement in a metric makes not much sense, but makes total sense in spans.

@lmolkova
Copy link
Contributor

lmolkova commented Feb 21, 2024

Assuming I do cross-table query, would db.operation and db.target be arrays?
Also, it'd be great to have it as opt-in on metrics, but array attribute on metrics is questionable.

I.e.

  • the default experience (for SQL) is not super-helpful (no table name, no statement, no operation)
  • opt-in statement parsing can add db.target and db.operation, but it's still far from perfect, especially if there is more than one operation/target

An 'ideal' solution as I see it would be if users could optionally give queries a nick-name which would appear on spans and metrics.
We don't have a common way to achieve it, but if there was one and we could get this nick-name, we would put it into attribute like db.query.name.

The propagation mechanism could look like this:

Context dbScope = Context.current();
try(Scope scope = dbScope.with(ContextKey.named("db.query.name"), "get user by id").makeCurrent()) {
    // SELECT name FROM users WHERE userid = @userid
}

It could be fragile in some languages/cases and probably needs a spec change to make it possible in all languages.

What we can do now is to define such attribute, add it to metrics as conditionally_required: when available (even though it never is). This way semconv can go stable, default experience would not be great, but we'll have means in the semconv to create a good one.

We have similar cases (messaging.destination.template, http.route on client)

@lmolkova
Copy link
Contributor

lmolkova commented Feb 21, 2024

Another option could be that:
assuming db.statement can be templated, user can opt-in into collecting a limited number of db.statement on metrics.

We could potentially extend metrics advisory params api to configure an explicit allow-list for db.statement values.

this way cardinality would be fully in the user hands.

@trask
Copy link
Member

trask commented Apr 24, 2024

I think this is resolved by #870?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

No branches or pull requests

9 participants