Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define a common algorithm for service.instance.id #312

Merged
merged 31 commits into from
Feb 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
62d2150
Define a common algorithm for service.instance.id
jpkrohling Sep 12, 2023
fefca87
lint
jpkrohling Sep 12, 2023
f18b771
add pid to namespace/name for kubernetes
jpkrohling Sep 12, 2023
8d88ff5
add service.name to machine-id rule
jpkrohling Sep 12, 2023
7133167
use other combinations for uniquely identifying instances
jpkrohling Sep 12, 2023
76ca26f
more clarifications
jpkrohling Sep 13, 2023
67c41d0
determine the UUID's namespace
jpkrohling Sep 13, 2023
e7b3256
fix linter
jpkrohling Sep 13, 2023
f050618
final clarifications, rebase
jpkrohling Oct 19, 2023
43a6424
s/infered/inferred
jpkrohling Oct 19, 2023
e3a3a12
address users of app servers
jpkrohling Oct 19, 2023
4a39b3e
Update with example algorithm in Go
jpkrohling Dec 6, 2023
9d9d1eb
lint
jpkrohling Dec 6, 2023
b7339e2
further linting
jpkrohling Dec 6, 2023
f184348
service ID MUST wording
jpkrohling Dec 8, 2023
401c260
make namespace optional in some steps of the algorithm
jpkrohling Dec 8, 2023
10e9455
explicit order of priorities
jpkrohling Dec 12, 2023
bc1ae47
markdownlint
jpkrohling Dec 13, 2023
b2d8a0f
addressed review comments
jpkrohling Jan 15, 2024
84a8e4b
addressed most of Josh's comments
jpkrohling Jan 17, 2024
671c45d
regenerate
jpkrohling Jan 17, 2024
21fa616
tweaks based on the latest revies
jpkrohling Jan 23, 2024
e990595
table-generation
jpkrohling Jan 23, 2024
f2bfc7e
uuidv4 explicit value
jpkrohling Feb 13, 2024
ad3cb3f
add changelog entry
jpkrohling Feb 13, 2024
e3b0b3a
add chglog file
jpkrohling Feb 13, 2024
becd802
add erlang:node()
jpkrohling Feb 13, 2024
dd68f40
simplified the proposal
jpkrohling Feb 14, 2024
bcbf475
rewrote paragraph about collector
jpkrohling Feb 21, 2024
6b8421c
further clarified collector and containers
jpkrohling Feb 22, 2024
0538f40
Merge branch 'main' into jpkrohling/issue311
jsuereth Feb 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .chloggen/service-instance-id.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
change_type: 'enhancement'
component: resource
note: Define a common algorithm for `service.instance.id`.
issues: [312]
29 changes: 27 additions & 2 deletions docs/resource/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,10 +99,35 @@ as specified in the [Resource SDK specification](https://github.com/open-telemet
<!-- semconv service_experimental -->
| Attribute | Type | Description | Examples | Requirement Level |
|---|---|---|---|---|
| `service.instance.id` | string | The string ID of the service instance. [1] | `my-k8s-pod-deployment-1`; `627cc493-f310-47de-96bd-71410b7dec09` | Recommended |
| `service.instance.id` | string | The string ID of the service instance. [1] | `627cc493-f310-47de-96bd-71410b7dec09` | Recommended |
| `service.namespace` | string | A namespace for `service.name`. [2] | `Shop` | Recommended |

**[1]:** MUST be unique for each instance of the same `service.namespace,service.name` pair (in other words `service.namespace,service.name,service.instance.id` triplet MUST be globally unique). The ID helps to distinguish instances of the same service that exist at the same time (e.g. instances of a horizontally scaled service). It is preferable for the ID to be persistent and stay the same for the lifetime of the service instance, however it is acceptable that the ID is ephemeral and changes during important lifetime events for the service (e.g. service restarts). If the service has no inherent unique ID that can be used as the value of this attribute it is recommended to generate a random Version 1 or Version 4 RFC 4122 UUID (services aiming for reproducible UUIDs may also use Version 5, see RFC 4122 for more recommendations).
**[1]:** MUST be unique for each instance of the same `service.namespace,service.name` pair (in other words
`service.namespace,service.name,service.instance.id` triplet MUST be globally unique). The ID helps to
distinguish instances of the same service that exist at the same time (e.g. instances of a horizontally scaled
service).

Implementations, such as SDKs, are recommended to generate a random Version 1 or Version 4 [RFC
4122](https://www.ietf.org/rfc/rfc4122.txt) UUID, but are free to use an inherent unique ID as the source of
this value if stability is desirable. In that case, the ID SHOULD be used as source of a UUID Version 5 and
SHOULD use the following UUID as the namespace: `4d63009a-8d0f-11ee-aad7-4c796ed8e320`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't understand what it means to "use UUID as the namespace"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UUID v5 has a namespace. Previous versions of the PR had an example showing this, but perhaps this would help see where this namespace is used: https://pkg.go.dev/github.com/google/uuid#NewSHA1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFC also explains how namespace is used to generate a v5 UUID: https://datatracker.ietf.org/doc/html/rfc4122#page-13


UUIDs are typically recommended, as only an opaque value for the purposes of identifying a service instance is
needed. Similar to what can be seen in the man page for the
[`/etc/machine-id`](https://www.freedesktop.org/software/systemd/man/machine-id.html) file, the underlying
data, such as pod name and namespace should be treated as confidential, being the user's choice to expose it
or not via another resource attribute.

For applications running behind an application server (like unicorn), we do not recommend using one identifier
for all processes participating in the application. Instead, it's recommended each division (e.g. a worker
thread in unicorn) to have its own instance.id.

It's not recommended for a Collector to set `service.instance.id` if it can't unambiguously determine the
service instance that is generating that telemetry. For instance, creating an UUID based on `pod.name` will
likely be wrong, as the Collector might not know from which container within that pod the telemetry originated.
However, Collectors can set the `service.instance.id` if they can unambiguously determine the service instance
for that telemetry. This is typically the case for scraping receivers, as they know the target address and
port.

**[2]:** A string value having a meaning that helps to distinguish a group of services, for example the team name that owns a group of services. `service.name` is expected to be unique within the same namespace. If `service.namespace` is not specified in the Resource then `service.name` is expected to be unique for all services that have no explicit namespace defined (so the empty/unspecified namespace is simply one more valid namespace). Zero-length namespace string is assumed equal to unspecified namespace.
<!-- endsemconv -->
Expand Down
41 changes: 28 additions & 13 deletions model/resource/service_experimental.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,16 +22,31 @@ groups:
type: string
brief: >
The string ID of the service instance.
note: >
MUST be unique for each instance of the same `service.namespace,service.name` pair
(in other words `service.namespace,service.name,service.instance.id` triplet MUST be globally unique).
The ID helps to distinguish instances of the same service that exist at the same time
(e.g. instances of a horizontally scaled service). It is preferable for the ID to be persistent
and stay the same for the lifetime of the service instance, however it is acceptable that
the ID is ephemeral and changes during important lifetime events for the service
(e.g. service restarts).
If the service has no inherent unique ID that can be used as the value of this attribute
it is recommended to generate a random Version 1 or Version 4 RFC 4122 UUID
(services aiming for reproducible UUIDs may also use Version 5, see RFC 4122
for more recommendations).
examples: ["my-k8s-pod-deployment-1", "627cc493-f310-47de-96bd-71410b7dec09"]
note: |
MUST be unique for each instance of the same `service.namespace,service.name` pair (in other words
`service.namespace,service.name,service.instance.id` triplet MUST be globally unique). The ID helps to
distinguish instances of the same service that exist at the same time (e.g. instances of a horizontally scaled
service).

Implementations, such as SDKs, are recommended to generate a random Version 1 or Version 4 [RFC
4122](https://www.ietf.org/rfc/rfc4122.txt) UUID, but are free to use an inherent unique ID as the source of
this value if stability is desirable. In that case, the ID SHOULD be used as source of a UUID Version 5 and
SHOULD use the following UUID as the namespace: `4d63009a-8d0f-11ee-aad7-4c796ed8e320`.

UUIDs are typically recommended, as only an opaque value for the purposes of identifying a service instance is
needed. Similar to what can be seen in the man page for the
[`/etc/machine-id`](https://www.freedesktop.org/software/systemd/man/machine-id.html) file, the underlying
data, such as pod name and namespace should be treated as confidential, being the user's choice to expose it
or not via another resource attribute.

For applications running behind an application server (like unicorn), we do not recommend using one identifier
for all processes participating in the application. Instead, it's recommended each division (e.g. a worker
thread in unicorn) to have its own instance.id.

It's not recommended for a Collector to set `service.instance.id` if it can't unambiguously determine the
jpkrohling marked this conversation as resolved.
Show resolved Hide resolved
service instance that is generating that telemetry. For instance, creating an UUID based on `pod.name` will
likely be wrong, as the Collector might not know from which container within that pod the telemetry originated.
However, Collectors can set the `service.instance.id` if they can unambiguously determine the service instance
for that telemetry. This is typically the case for scraping receivers, as they know the target address and
port.
examples: ["627cc493-f310-47de-96bd-71410b7dec09"]
Loading