Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update enduser domain and add enduser.authentication.id #1456

Open
wants to merge 30 commits into
base: main
Choose a base branch
from

Conversation

heyams
Copy link
Contributor

@heyams heyams commented Oct 7, 2024

Fixes #1104

@heyams heyams force-pushed the heya/add-enduser-namespace branch from 0a8b0e8 to ee0970f Compare October 7, 2024 19:06
model/enduser/registry.yaml Outdated Show resolved Hide resolved
model/enduser/registry.yaml Outdated Show resolved Hide resolved
model/enduser/registry.yaml Outdated Show resolved Hide resolved
brief: >
Describes information about the end user, which can be used as a subdomain of browser, client, or user domains.
attributes:
- id: enduser.id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems confusing to set enduser.id = "QdH5CAWJgqVT4rOr0qtumf" and enduser.authentication.id = "lmolkova", based on the

https://github.com/open-telemetry/semantic-conventions/pull/1146/files#r1712997369 and https://github.com/open-telemetry/semantic-conventions/pull/1146/files#r1710187141

It'd be more clear if we called this one enduser.tracking.id or enduser.anonymous.id so that people would not put PII there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per earlier discussion, it seemed that anonymous was confusing to some.
i'm good with enduser.anonymous_id or enduser.tracking_id. neither tracking nor anonymous a namespace, nesting here doesn't seem to follow the naming convention?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean that enduser.tracking.id would not follow the naming convention? Why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like enduser.tracking.id

Copy link
Contributor Author

@heyams heyams Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some examples

messaging.consumer_id: messaging.consumer.id

nesting is used for .
if not, use _

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmolkova @trisch-me Please review and fill in anything that i may have missed.:

enduser.pseudo.id
- pros: it's clear to know this id is not authenticated
- cons: it might lead to misinterpretation of this id, like it's not a real id, a testing id, a temporary id?

enduser.tracking.id' - pros: it's clear that this id is used to track a particular user. - cons: tracking` may raise privacy concerns, as it implies monitoring user behavior, which could lead to user distrust. it also lacks context of what exactly is being tracked (e.g. user actions, sessions, locations, etc)

enduser.unauth.id
- pros: unauth is short
- cons: unauth is ambiguous, as it can be unauthenticated or unauthorized. additionally, acronym is not a good naming practice and leads to more confusion.

enduser.temp.id or enduser.transient.id
- pros: it suggests that this id is temporary and associated with user who has not been authenticated.
- cons: it lacks context about the id is temporary for what context (e.g. session, authentication)

enduser.unauthenticated.id
- pro: it's clear to indicate an authenticated user.
- cons: it collides with enduser.authentication.id, which can be renamed to enduser.authenticated.id, then it would have been fine?

enduser.anonymous.id
- pros: it's clear that this id is anonymous.
- cons: it can be confusing and lacks context. as long we have a clear documentation, this should be ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of like these three:

  • enduser.pseudo.id
  • enduser.transient.id
  • enduser.ephemeral.id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attended Client SIG this morning,
they also preferred enduser.pseudo.id after going through this list.

Copy link
Contributor

@trisch-me trisch-me Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what were the cons for enduser.unauthenticated.id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have already discussed and agreed to have enduser.authentication.id. authentication as a sub-namespace under enduser. anything that is not under authentication is either unauthenticated or unauthorized or anonymous or some random id.

@heyams heyams changed the title Update enduser domain and add authentication as a subdomain Update enduser domain and add enduser.authentication.id Oct 22, 2024
@heyams heyams changed the title Update enduser domain and add enduser.authentication.id Update enduser domain and add enduser.authentication.id Oct 22, 2024
@heyams
Copy link
Contributor Author

heyams commented Oct 23, 2024

need some advice with this weaver error, it didn't point to any files that I have modified. it's persisting.
image

@trask
Copy link
Member

trask commented Oct 23, 2024

need some advice with this weaver error, it didn't point to any files that I have modified. it's persisting. image

if you scroll up from there, you'll see:

ℹ Validating: $/home/weaver/target/general/attributes.md
✖ Could not find: identity

which looks related to this file that was deleted in your PR: model/enduser/deprecated/common.yaml

@heyams heyams marked this pull request as ready for review November 22, 2024 23:41
@heyams heyams requested review from a team as code owners November 22, 2024 23:41
@heyams heyams requested a review from joaopgrassi November 26, 2024 19:16
Comment on lines 19 to 20
Identifier of an anonymous end user who interacts with a system.
This identifier may be unique only through best-effort means and does not imply that the user is authenticated to the system.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meeting notes from semantic convention SIG meeting today:

  • is this something that OpenTelemetry instrumentation is planning populate, or is this only something that vendor-specific instrumentation is planning to populate?
  • if it's something that OpenTelemetry instrumentation is planning to populate, what would the implementation look like, e.g. would this be stored in a persistent cookie? would it be stamped onto a specific event?
  • there is a desire not to add attributes into the semconv repo when only a single vendor has expressed interest in them
  • there is also a desire not to add attributes into the semconv repo without having any span/event/metric definitions that use them

the recommendation for next steps was to discuss this in the Client SIG

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@heyams were those questions addressed during client sig meeting?

Copy link
Contributor Author

@heyams heyams Dec 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we kind of covered it. cc @MSNev

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you post the answers here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this something that OpenTelemetry instrumentation is planning populate

  • Yes, this will be added by instrumentations either directly or indirectly (via a psuedo user manager -- like the Session Manager)

if it's something that OpenTelemetry instrumentation is planning to populate, what would the implementation look like, e.g. would this be stored in a persistent cookie? would it be stamped onto a specific event?

  • It will need something like the SessionManager implementation to "manager" the lifecycle of this value, some environments would be just to create a simple random value for App start (like Android) while in a browser which is stateless then the "user manager" would most likely use cookies (but it could also use session storage)

there is a desire not to add attributes into the semconv repo when only a single vendor has expressed interest in them

  • This will not be single vender specific, but it is highly RUM specific

there is also a desire not to add attributes into the semconv repo without having any span/event/metric definitions that use them

  • It should be available both Spans and Logs, while it could be available for Metrics it's cardinality (because its a random value) should not necessarily be used for metrics...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! a couple of followup questions:

  • if/when entity (or mutable resource) attributes are available, do you see enduser.pseudo.id as one of those?
  • are there any specific span or event semantic conventions that we can add enduser.pseudo.id to? or is enduser.pseudo.id hopefully just an entity (or mutable resource) attribute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we discussed these in the client SIG. I will wait for @MSNev's response.

Copy link
Contributor

@MSNev MSNev Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • If/when mutable resources attributes (whether they are entities or shorter lived lifespan resources ) then this (and session) could possibly by represented there.

are there any specific span or event semantic conventions that we can add enduser.pseudo.id to?

No, not specifically as this really is just additional context details.

Copy link
Contributor

@lmolkova lmolkova Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe having a document covering enduser convention as a whole was a critical feedback point.
We're trying to avoid adding attributes without also adding a signal that leverages them.

I'm surprised we don't have any signals to populate these on. Wouldn't we add them on session events?
https://github.com/open-telemetry/semantic-conventions/blob/f7362c7066856ff8591ac461d4e3b31ad7af3a4b/docs/general/session.md

If we're going to just stamp them on all telemetry items, let's document it as an attribute group - this would be the place to describe any additional guidance. This doc has to be updated anyway.

I believe it should cover:

  • who populates those attributes - would an HTTP instrumentation do it? Distro? End user apps?
  • when they are populated? Whenever they are known? Is it opt-in? etc
  • how to populate them - where would I get this information from? There are some mappings in existing docs - are they still relevant?
  • is user info propagated to other services (currently docs say it should be using baggage)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed here

@heyams
Copy link
Contributor Author

heyams commented Dec 10, 2024

image

it was a false-negative. that link works. how do i ignore this failure?

@trask
Copy link
Member

trask commented Dec 10, 2024

it was a false-negative. that link works. how do i ignore this failure?

it's ok, it's not a required check, we're working on resolving the flakiness

Comment on lines +18 to +22
- ref: enduser.authentication.id
requirement_level: required
note: >
The `enduser.authentication.id` attribute is intended to provide an unique identifier of an authenticated enduser.
The deprecated attributes `enduser.authentication.role` and `enduser.authentication.scope` are removed from the enduser registry.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that enduser.id would be the normal (authenticated) user id, do I remember that correctly? thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1456 (comment)
we decided to use enduser.authentication.id for the authenticated user id since the beginning of this discussion.

renaming enduser.id to something specific (like enduser.pseudo.id) so that user don't put authenticated user id under this attribute.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you summarize the reason for using enduser.authentication.id instead of enduser.id?

does this mean we will prohibit future "embedding" of user.id into the enduser.* namespace? or will we have 3 "id" attributes including enduser.id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we decided to have a sub-namespace called authentication so that we can add other attributes to do with authentication in the future. enduser.id is renamed to enduser.pseudo.id for unauthenticated/unauthorized/anonymous or any other kind of tracking id as long as it's not authenticated.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we decided to have a sub-namespace called authentication so that we can add other attributes to do with authentication in the future.

I don't see any discussion around it or some motivation to introduce enduser.authentication.id. Is there a link? Where did it come from?
If we need to add authentication namespace with some other properties in the future, we can always do it.

The enduser.id seems general and concise. I'd prefer to use it unless there are strong reasons not to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was the 1st thing i did on this long-discussed thread.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1146 was never approved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but this PR is based on that PR.

Copy link

github-actions bot commented Jan 5, 2025

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added Stale and removed Stale labels Jan 5, 2025
Given the sensitive nature of this information, SDKs and exporters SHOULD drop these attributes by
default and then provide a configuration parameter to turn on retention for use cases where the
information is required and would not violate any policies or regulations.
`enduser.pseudo.id` attribute can be set by a specific client component, e.g. through a cookie out of the Span's HTTP request headers. Client side application should be able to stamp this attribute on any telemetry item emitted by the application whenever this cookie is available.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmolkova please let me know your thoughts on this statement.

Copy link
Contributor

@lmolkova lmolkova Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if my suggestion is factually correct - please adjust it, but I'd phrase it differently - something along the following lines:

Enduser attributes capture end user identity. They are likely to contain PII and should be populated, processed, and stored with caution.
Information about the end user is usually available on the client side (in a mobile or browser application). Enduser attributes are populated by the user application in coordination with OpenTelemetry SDK. Some OpenTelemetry distributions auto-collect this information from HTTP cookies.
When user information is available, it's RECOMMENDED to add it to all spans and events emitted in the scope of operation initiated by this user.

Application in coordination with OpenTelemetry SDK and Distro MAY propagate user information from the client application to the front end and across different backend services using custom HTTP cookies and/or Baggage.

Enduser information is collected and populated manually by user application or specialized components,
other instrumentations such as HTTP or RPC are not expected to populate these attributes by default.

<!-- TODO: add link to the implementation in otel-js-contirb -->

I would also put it above the table.

I think it should also replace the content of lines 413-415 above

It is expected this information would be propagated unchanged from node-to-node within the system
using the Baggage mechanism. These attributes should not be used to record system-to-system
authentication attributes.

requirement_level: required
note: >
The `enduser.authentication.id` attribute is intended to provide an unique identifier of an authenticated enduser.
The deprecated attributes `enduser.authentication.role` and `enduser.authentication.scope` are removed from the enduser registry.
Copy link
Contributor

@lmolkova lmolkova Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The deprecated attributes `enduser.authentication.role` and `enduser.authentication.scope` are removed from the enduser registry.

I don't think that's accurate and also there is no need to mention changes in yaml that have no effect on markdown.

@@ -2,23 +2,27 @@ groups:
- id: registry.enduser.deprecated
type: attribute_group
display_name: Deprecated End User Attributes
brief: Describes deprecated enduser attributes. Complete enduser namespace has been deprecated
brief: "Describes deprecated end user attributes."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
brief: "Describes deprecated end user attributes."
brief: "Describes deprecated enduser attributes."

attributes:
- id: enduser.id
type: string
brief: 'Deprecated, use `enduser.pseudo.id` instead.'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe enuser.id should stay as it is. Even if we call it enduser.authentication.id, the old enduser.id matches authenticated one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main reason we came up with enduser.pseudo.id or a list of other naming options because we don't want customers to put authenticated user id to enduser.id. authenticated one is using enduser.authentication.id.

Copy link
Contributor

@lmolkova lmolkova Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ask is to keep enduser.id instead of adding enduser.authentication.id. This PR would add enduser.pseudo.id and un-deprecate enduser.id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that wasn't what we discussed though. what about the sub-namespace authentication? i thought that is what we want to do so that any other attributes associated with authentication can be added later.

if you want to change it now, that will be different. we would need to re-discuss it. i thought the consensus is to have authentication sub namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1104 (comment) it was from you @lmolkova

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's been under the assumption that we use user.id for pseudo/tracking/anonymous id - as you can see we've reconsidered it after that.

if you want to change it now, that will be different. we would need to re-discuss it.

It seems at least me and @trask are on the same page to use enduser.id instead of enduser.authentication.id. I believe I saw some support from @trisch-me and @jsuereth on it too.

Please make the change and if anyone has objections they will be able to comment and provide reasoning.

- id: enduser.authentication.id
type: string
brief: "Unique identifier of an authenticated user in the system."
examples: [ 'S-1-5-21-202424912787-2692429404-2351956786-1000' ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems to be an example of windows OS user (SID). Given that this attribute is intended to represent human identity, we should use a realistic example (login? email? guid?)

Also, we should add a note that it's likely a PII. I have a proposal on how to capture it (until we have a formal way in #1707)

note: |
...
> [!WARNING]
>
> This attribute contains sensitive (PII) information.
```

attributes:
- id: enduser.pseudo.id
type: string
stability: experimental
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use development now, let's change it here and in other non-deprecated attributes

Suggested change
stability: experimental
stability: development

type: attribute_group
display_name: End User Attributes
brief: >
Describes information about the end user, which can be used as a subdomain of browser, client, or user domains.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Describes information about the end user, which can be used as a subdomain of browser, client, or user domains.
Describes the end user.

I don't think the other part is correct:

which can be used as a subdomain of browser, client, or user domains.

brief: "Deprecated, use `user.id` instead."
examples: 'username'
deprecated: "Replaced by `enduser.pseudo.id`."
examples: ['QdH5CAWJgqVT4rOr0qtumf']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please bring the old example back

requirement_level: recommended
- ref: enduser.pseudo.id
requirement_level: recommended
- ref: enduser.role
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove it from this group - it's listed as deprecated attribute and is not necessary here

- ref: enduser.role
deprecated: "Removed."
requirement_level: recommended
- ref: enduser.scope
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, please remove the reference

- id: identity
type: attribute_group
brief: >
These attributes may be used for any operation with an authenticated and/or authorized enduser.
Copy link
Contributor

@lmolkova lmolkova Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
These attributes may be used for any operation with an authenticated and/or authorized enduser.
Describes end user identity.

we support unauthenticated users too, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, enduse.pseudo.id can be unauthenticated, unauthorized, anonymous, and etc.. as long as it's not authenticated, which is tracked under enduser.authentication.id.

deprecated: "Removed."
requirement_level: recommended
- ref: enduser.authentication.id
requirement_level: required
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it can be required since it's likely to contain PII

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove it

@@ -410,34 +414,7 @@ system. It is expected this information would be propagated unchanged from node-
using the Baggage mechanism. These attributes should not be used to record system-to-system
authentication attributes.

Examples of where the `enduser.id` value is extracted from:

| Authentication protocol | Field or description |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to keep this table - it actually explains what the enduser.id is - we don't provide an explanation like this anymore. Is this information still accurate and can it be used to capture enduser.id ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enduser.id has been replaced with 'enduser.pseudo.id`. i don't think it makes sense to keep it.

@heyams
Copy link
Contributor Author

heyams commented Jan 14, 2025

For people who are flip-flopping on what we discussed about sub namespace authentication, a friendly reminder to revisit these links:

#1104 (comment)
#1104 (comment)

#1172
image

@trask
Copy link
Member

trask commented Jan 15, 2025

For people who are flip-flopping on what we discussed about sub namespace authentication, a friendly reminder to revisit these links

hi @heyams, I don't see any arguments in those links for specifically preferring enduser.authenticated.id over enduser.id?

in order to help this PR move forward, can you write out the reasons you have for specifically preferring enduser.authenticated.id over enduser.id? (other than that you thought it was decided already)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

User.id for authenticated user id
8 participants