Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BREAKING: rename http.client_ip to http.forwarded.for and introduce http.forwarded.proto attributes #3218

Closed
wants to merge 7 commits into from

Conversation

lmolkova
Copy link
Contributor

@lmolkova lmolkova commented Feb 16, 2023

Fixes #2338

http.client_ip attribute does not work well with new(ish) forwarded header definition:

  • in the RFC, for component is a list of nodes, which can be IP addresses, IP:port combinations, or an obfuscated identifiers. This makes client_ip not precise as a node is not necessarily an IP.
  • there is a desire to add other forwarded components: proto or host. Having a descriptive namespace to populate all of them is preferable. Adding http.client_proto and http.client_host could be confusing.

Changes

  • [BREAKING] renames http.client_ip to http.forwarded.for
  • introduces http.forwarded.proto

Full forwarded value can be captured by enabling collection of the corresponding header (http.request.header.forwarded), same goes for x-forwarded-* headers.

Additional context:

  • ECS supports network.forwarded_ip, but it has similar limitations as http.client_ip see discussions in ECS: [meta] Add support for proxies in ECS elastic/ecs#938
  • while reverse proxies support other-than-HTTP protocols, I was not able to find any analogs to forwarded headers. Given that header is specified for HTTP only, it makes sense to put it into http namespace.

Alternative attribute name:

since for component in forwarded header is not a single address, but a list of them, it could be better to call the corresponding attribute http.forwarded.addr

@lmolkova lmolkova added area:semantic-conventions Related to semantic conventions semconv:HTTP labels Feb 16, 2023
semantic_conventions/trace/http.yaml Outdated Show resolved Hide resolved
semantic_conventions/trace/http.yaml Outdated Show resolved Hide resolved
@lmolkova lmolkova marked this pull request as ready for review February 17, 2023 02:20
@lmolkova lmolkova requested review from a team February 17, 2023 02:20
@SergeyKanzhelev SergeyKanzhelev removed their assignment Feb 18, 2023
@lmolkova lmolkova force-pushed the http-forwarded branch 2 times, most recently from 24ebd34 to 642eb92 Compare February 20, 2023 22:03
@arminru arminru added the spec:trace Related to the specification/trace directory label Feb 21, 2023
@tigrannajaryan tigrannajaryan changed the title Add http.forwarded attributes BREAKING: rename http.client_ip to http.forwarded.for and introduce http.forwarded.proto attributes Feb 28, 2023
@Oberon00
Copy link
Member

Oberon00 commented Mar 2, 2023

I doubt this is a replacement for http.client_ip. Should these attributes maybe be added in addition? There may be sources of client_ip other than these HTTP headers. (weak opinion, I'm not very sure here)

@trask
Copy link
Member

trask commented Mar 2, 2023

I doubt this is a replacement for http.client_ip. Should these attributes maybe be added in addition? There may be sources of client_ip other than these HTTP headers. (weak opinion, I'm not very sure here)

since we aren't sure of the use case for http.client_ip, I'd suggest that we drop it before stability, and we can always add it (or something similar) back in the future

@MSNev
Copy link
Contributor

MSNev commented Mar 3, 2023

I doubt this is a replacement for http.client_ip. Should these attributes maybe be added in addition? There may be sources of client_ip other than these HTTP headers. (weak opinion, I'm not very sure here)

Yep a quick search show lots of different ones

  • X-Client-IP
  • X-Forwarded-For
  • CF-Connecting-IP (Cloudflare)
  • Fastly-Client-IP (Fastly CDN)
  • True-Client-IP (Akamai, Cloudflare)
  • X-Real-IP (Nginx / FastCGI)
  • X-Cluster-Client-IP (Rackspace, Riverbed)

Node npm package request-ip

@trask
Copy link
Member

trask commented Mar 3, 2023

X-Client-Id - I couldn't find a definition of this, so it's hard to say what the semantics of it are.

X-Forwarded-For is covered by this PR's http.forwarded.for

CF-Connecting-IP is "the client IP address connecting to Cloudflare to the origin web server" (which could still be a proxy itself and not the original client), so it doesn't represent http.client_ip anyways, which is "the IP address of the original client behind all proxies". If/when a semconv is needed for CF-Connecting-IP, maybe it makes more sense for it to be under cloudflare.* conventions.

Fastly-Client-IP is similarly defined as "the non-Fastly thing that is making the request to Fastly", which also doesn't match http.client_ip. If/when a semconv is needed for Fastly-Client-IP, maybe it makes more sense for it to be under fastly.* conventions. (or maybe there's some new generalization to be made between fastly and cloudflare)

True-Client-IP - "There is no difference between the True-Client-IP and CF-Connecting-IP headers besides the name of the header"

I didn't look up the others, but I'd still suggest that we drop http.client_ip for now, before stability. We can always add it back later if we discover we were wrong and there's a need for it.

@trask
Copy link
Member

trask commented Mar 3, 2023

one other thought, http.forwarded.for in this PR is still defined as

The address of the original client behind all proxies.

and while typically that will come from the Forwarded or X-Forwarded-For header, it's not saying it can't come from another header.

@lmolkova
Copy link
Contributor Author

lmolkova commented Mar 3, 2023

the problem with client_ip is that forwarded HTTP standard header does not necessarily carries an IP, it rather recommends obfuscating it. Given that forwarded header is part of HTTP standard, we should support it properly.

If we keep client_ip attribute it'd serve unclear and narrow scenarios. forwarded.for allows to put original IP into it including any of the x-something headers (if they represent original IP).

@scheler
Copy link
Contributor

scheler commented Mar 5, 2023

Can http.request.header.<key> not be used for the purpose?

@lmolkova
Copy link
Contributor Author

lmolkova commented Mar 6, 2023

Can http.request.header.<key> not be used for the purpose?

Not really.

here we capture individual components - proto (scheme) and the first address in for component. Also these attributes put a burden on instrumentation to support variety of forwarded, x-forwarded-* and other attributes, hiding this from users and providing attributes by default.

http.request.header.<key> puts all burden on users and requires them to enable it explicitly.

schemas/1.19.0 Outdated Show resolved Hide resolved
@lmolkova lmolkova requested a review from a team March 8, 2023 19:37
@lmolkova lmolkova reopened this Mar 8, 2023
@trask
Copy link
Member

trask commented Mar 10, 2023

@Oberon00 @MSNev @scheler have your concerns above been addressed?

@lmolkova
Copy link
Contributor Author

@MSNev I see you still have concerns, could you please share how you see a perfect solution to the problem of recording original caller maybe-IP ?
Current approach allows to capture any of the headers you mentioned above under http.forwarded.for attribute.

@MSNev
Copy link
Contributor

MSNev commented Mar 10, 2023

@trask @Oberon00 @MSNev Nev Wylie FTE @scheler have your concerns above been addressed?

Simplistically No, but I'm not going to the extreme and block the PR.

My main concerns are that people are / may already using this value and based on the existing comments it is "suggested" that this is the closest (single) field which identifies the actual client ip (as far as can be determined). While the new wording explicitly prohibits setting this value if nothing exists. Meaning there will now be no single location for anyone to "check".

There is also an implicit assumption introduced by renaming this to "forward.for", the user is always behind some form of intervening proxy (which will not "always" be the case), which is reinforced by the "SHOULD NOT" set. So this (forces) complications for backend / queries in that now they need to always check this new field AND net.sock.peer.addr (because forward.for should not be set while client_ip calls out if they are the same).

Additionally, and as with a lot of the existing OpenTelemetry implementations, this reinforces the server side view (receiver) within the API and the semantic conventions, ignoring how any existing usages for "senders" (clients, who "know" their own IP address (public or not) -- eg. in-house private network monitoring), this use case does not map to forward.for unless you squint and say I sent a request me (local instrumentations that wrap "sending" API's and create Span's, Logs' or Events for this).

Keeping http.client_ip would allow the instrumentations to provide the "real" client ip as part of the request (in these cases) as the location is put this "known" value.

since we aren't sure of the use case for http.client_ip, I'd suggest that we drop it before stability, and we can always add it (or something similar) back in the future

So I disagree.

My preference would be to keep the http.client_ip as the final "container" for what is believed to be the real client ip (after resolving all sources) keeping similar wording to what already exists...

http.client_ip when it's the same as net.sock.peer.addr means that one is at least somewhat confident that the address is not that of the closest proxy.

While still adding the forwarded.for to support this known use case of the forwarding headers.

In terms of similar prior-art for this scenario ECS does provide the forward-for (which is where I assume this came from, AND the client address / ip (albeit in it's own general client namespace).

So to summarize my preference

  • keep http.client_ip
  • If/when ECS and the client namespace is merged, then at that point I would be supportive of a possible renaming of this field.

@lmolkova
Copy link
Contributor Author

lmolkova commented Mar 10, 2023

I hear several concerns, please correct me if I miss something:

  1. we don't have an attribute to represent 'best known client address', which could be either behind proxy or a direct peer. Note that this PR does not change status quo on this front, http.client_ip represented IP behind proxy before it.
  2. the client-side concern. I'm not sure how it applies to server-only http.client_ip attribute.
  3. breaking change. It's still an experimental spec and also our last chance to fix naming. there likely will be more breaking-changes coming.

Additionally, client_ip is not a good name for something that's not necessarily an IP.

So, it's possible that we'll need to

  1. introduce other attributes that serve best-known-original-client-address in the future
  2. OR rename and repurpose the http.client_ip aka http.forwarded.for to represent that one now.

@Oberon00
Copy link
Member

Oberon00 commented Mar 13, 2023

I can't really add more in this discussion I think, I'm not deep enough in the topic. Just one remark:

While the new wording explicitly prohibits setting this value if nothing exists.

I think client_ip already must not be set if you have no information beyond net.sock.peer.ip.

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

one is at least somewhat confident that the address is not that of
the closest proxy.
examples: '83.164.160.102'
This attribute SHOULD be set when a source of information different
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This attribute SHOULD be set when a source of information different
This attribute SHOULD be set when the source of information is different

`http.client_ip` when it's the same as `net.sock.peer.addr` means that
one is at least somewhat confident that the address is not that of
the closest proxy.
This attribute SHOULD be set when a source of information different
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke with trask offline around this to highlight the "source" nature of this description.
I would suggest adding or keeping the wording to further highlight the source (vs value) of the obtain information.

eg. I think the following in the existing clearly states that this should be set
is available even if that other source just confirms the same value

while I originally mis-read the updated version to indicate that the value should not be set if it's the same as net.sock.peer.addr

@MSNev
Copy link
Contributor

MSNev commented Apr 5, 2023

The only other remaining item is the breaking nature of this change, especially now that with merging with ECS that this would likely change yet again. If this change is delayed and included in the bigger set of breaking (ECS merge) changes I think this would be better and then my concerns would be reduced.

And for additional reference, today for Node.JS the http instrumentation populates the client_ip (when the X-Forwarded-For is present) and this is the primary way network requests are tracked. Which means that EVERY OTel node user is currently receiving / using this today (even though it's not marked as stable). So breaking this and then forcing an update followed by another update when ECS would be problematic for consumers.

@trask
Copy link
Member

trask commented Apr 6, 2023

we discussed in HTTP semconv stability WG and agreed that this change should be bundled with the other HTTP semconv breaking changes and covered by the transition period we are defining in #3362

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Apr 13, 2023
@trask trask removed the Stale label Apr 13, 2023
@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Apr 26, 2023
@lmolkova
Copy link
Contributor Author

closing this PR:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:semantic-conventions Related to semantic conventions semconv:HTTP spec:trace Related to the specification/trace directory Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Is it ok for http.scheme to factor in forwarded "proto" field / x-forwarded-proto headers?
9 participants