Clarify usage of "component" attribute in semantic conventions #336

tigrannajaryan · 2019-10-24T13:06:32Z

Span Semantic conventions currently define an attribute "component". It is unclear what exactly this attribute is intended to reflect.

Semantic conventions define different values of "component" attribute in each of the sections of the document, as if the sections are exclusive of one another, which seems to be an incorrect assumption. A counter-example to that assumption would be a Database system which is accessible via HTTP transport (e.g. https://wiki.postgresql.org/wiki/HTTP_API). In this counter-example it is unclear whether the "component" should be set to "http" or to "postgresql".

Unless there is a clear intent on the meaning of "component" attribute and what value should be populated when the emitting component logically belongs to different kinds of emitters defined in this document I suggest that we remove the attribute altogether. Consumers of attributes which are interested in learning the type of the component that emitted the span can do so by examining the presence or absence of "Required" attribute of appropriate attribute group. For example the presence of "http.method" attribute would mean that the emitter is an "http" component (perhaps in addition to being something else at the same time).

Oberon00 · 2019-10-24T13:17:45Z

Related: #245

Oberon00 · 2019-10-24T14:40:38Z

In this counter-example it is unclear whether the "component" should be set to "http" or to "postgresql".

At some point the back end at least will often want to decide whether this Span is a DB or a HTTP span (e.g. to display an icon). So the question is: Should the back end decide this (e.g. using heuristics like HTTP + DB => DB), or should the instrumentation decide this? If the latter, the component attribute is helpful.

One could also make the point that in practice most likely, the Postresql integration would create a Span with DB attributes and the HTTP integration would create a child span with HTTP attributes instead of the attributes being mixed in a single span. I wonder if we can find an example where we really would have mixed span types.

jmacd · 2019-10-24T16:49:29Z

My understanding had been that component would equal the name of the named Tracer or Meter.

tigrannajaryan · 2019-10-24T16:56:50Z

My understanding had been that component would equal the name of the named Tracer or Meter.

The semantic convention requires hard-coded values for "component" field based on the type of the emitter, e.g. "http", so I think it does not align with your understanding (not saying your understanding is wrong - I am not sure what is right here - to me the whole "component" attribute is superfluous).

jmacd · 2019-10-24T17:31:09Z

@tedsuo What do you think?

tedsuo · 2019-10-24T20:22:05Z

I think we can resolve this by switching to a structured data format. In this case, the "component" tag becomes the key to a dictionary containing semantically defined keys and values.

The named tracer which is reporting the spans can also be represented as span attributes. For clarity, let's call that dictionary key "reporter" rather than "component."

For example, a python client speaking to mongodb over HTTP might generate a span with the following attributes:

  {
    "reporter": {
      "name": "pymongo-opentelemetry",
      "version": "1.2.3"
    },
    "span":{
       "kind": "client",
     },
    "http_client": {
      "method": "GET",
      "url": "https://example.com:779/path/12314/?q=ddds#123",
      "status_code": "200"
    },
    "db": {
      "type": "mongodb",
      "statement": "db.getCollection('foo').find()",
      "instance": "customers",
    },
    "tcp": {
      "ipv4": "127.0.0.1",
      "port": "80"
    }
  }

In this world, as long as there is no collision between key names, there shouldn't be an issue with a span representing multiple types.

@Oberon00 I understand the issue you bring up: Should instrumentation define a span "type" to make some UI decisions clearer, such as what icon to show? I lean towards no: it would be better for instrumentation to focus on providing unambiguous, objective data.

Subjective decisions, such as wether the icon should be "http" or "client" or "db" might be difficult. Which icon would you pick for an HTTP PUT request to an address which clearly describes an AWS S3 bucket? S3 logo, db icon, client request icon? Best to apply labels like that when the data hits a specific back-end system.

tigrannajaryan · 2019-10-24T21:10:03Z

@tedsuo the reporter as you defined it makes a lot of sense. That is different from how component is defined today (and I find reporter a lot more useful and meaningful than component).

Subjective decisions, such as wether the icon should be "http" or "client" or "db" might be difficult. Which icon would you pick for an HTTP PUT request to an address which clearly describes an AWS S3 bucket? S3 logo, db icon, client request icon? Best to apply labels like that when the data hits a specific back-end system.

I tend to agree. I think backends should make these decisions and to make such decisions they don't need component attribute (nor probably reporter as you defined it), since they can infer the same (and more) information from the presence of relevant attributes defined by semantic conventions (e.g. http.method or db.type)

That said I disagree with this part:

I think we can resolve this by switching to a structured data format.

I think the structured vs unstructured format is orthogonal to the issue I raised. We can encode reporter data using equivalent flat structure with no loss of semantics and using the same dot notation as currently used by semantic conventions, e.g.:

{
  "reporter.name": "pymongo-opentelemetry",
  "reporter.version": "1.2.3"
}

jmacd · 2019-10-24T21:27:33Z

@tigrannajaryan I see no value in having a hard-coded list of component values the way the spec currently is written. @tedsuo you used an example of what I think would come from a named Tracer, not one of the hard-coded values currently in the spec. Are you thinking of this component/reporter name as equal to the name of the Tracer or Meter?

I would propose that we specify a conventional name for the name of of the named Meter or Tracer. It can be reporter if you like, or component. It should have structure, so a .name and a .version, the version having been discussed in https://github.com/open-telemetry/oteps/blob/master/text/0038-version-semantic-attribute.md.

tedsuo · 2019-10-25T01:07:16Z

@jmacd sounds good. I sense that we are all mostly on the same page, so just to summarize what I believe we are suggesting:

We can remove the current component tag. Spans do not require a type field other than span_kind.
Instead, spans can be composed of multiple "types" at the same time, via the semantic definitions. Attributes for each semantic definition are namespaced. So, all HTTP client attributes are namespaced with http_client. Whether this namespacing is done via dot-notation or structured data is moot.
The name value which comes from the TracerFactory was intended to represent the specific package name of the instrumentation library - for example io.opentelemetry.contrib.mongodb or pymongo-opentelemetry. It was not intended to match a component value, such as db. If this TracerFactory data needs to be stapled onto every span, that could be done with a new semantic definition, such as reporter.name and reporter.version.

(BTW, I'm not necessarily arguing we should staple the TracerFactory data onto every span, just pointing out how the data would be modelled.)

tigrannajaryan · 2020-02-18T17:26:48Z

"component" attribute has been removed from semantic conventions via #447
Closing this as no longer applicable.

jmacd mentioned this issue Nov 25, 2019

Miscellaneous additional semantic convention attribute names #362

Closed

tigrannajaryan mentioned this issue Dec 12, 2019

Defined additional semantic convention attribute names #383

Closed

arminru mentioned this issue Jan 23, 2020

Define span attributes for messaging systems #418

Merged

tigrannajaryan closed this as completed Feb 18, 2020

TuckTuckFloof pushed a commit to TuckTuckFloof/opentelemetry-specification that referenced this issue Oct 15, 2020

add docfx ci (open-telemetry#336)

e3a851b

Oberon00 mentioned this issue Oct 30, 2020

Add category attribute to spans #1161

Open

mariojonke mentioned this issue Jan 27, 2021

Remove 'component' span attribute in instrumentations open-telemetry/opentelemetry-python-contrib#300

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify usage of "component" attribute in semantic conventions #336

Clarify usage of "component" attribute in semantic conventions #336

tigrannajaryan commented Oct 24, 2019 •

edited

Loading

Oberon00 commented Oct 24, 2019

Oberon00 commented Oct 24, 2019 •

edited

Loading

jmacd commented Oct 24, 2019

tigrannajaryan commented Oct 24, 2019

jmacd commented Oct 24, 2019

tedsuo commented Oct 24, 2019 •

edited

Loading

tigrannajaryan commented Oct 24, 2019

jmacd commented Oct 24, 2019

tedsuo commented Oct 25, 2019 •

edited

Loading

tigrannajaryan commented Feb 18, 2020

Clarify usage of "component" attribute in semantic conventions #336

Clarify usage of "component" attribute in semantic conventions #336

Comments

tigrannajaryan commented Oct 24, 2019 • edited Loading

Oberon00 commented Oct 24, 2019

Oberon00 commented Oct 24, 2019 • edited Loading

jmacd commented Oct 24, 2019

tigrannajaryan commented Oct 24, 2019

jmacd commented Oct 24, 2019

tedsuo commented Oct 24, 2019 • edited Loading

tigrannajaryan commented Oct 24, 2019

jmacd commented Oct 24, 2019

tedsuo commented Oct 25, 2019 • edited Loading

tigrannajaryan commented Feb 18, 2020

tigrannajaryan commented Oct 24, 2019 •

edited

Loading

Oberon00 commented Oct 24, 2019 •

edited

Loading

tedsuo commented Oct 24, 2019 •

edited

Loading

tedsuo commented Oct 25, 2019 •

edited

Loading