Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BREAKING: Limit HTTP method values to closed set (for cardinality reason) #3478

Closed
wants to merge 17 commits into from

Conversation

lmolkova
Copy link
Contributor

@lmolkova lmolkova commented May 5, 2023

Fixes #3470

Changes

  • restricts http.method values to well-known method names
  • allows to optionally support any closed set of methods
  • required to use OTHER for unknown methods to prevent cardinality explosion if a malicious (or buggy) client sends custom and dynamic methods.

@lmolkova lmolkova force-pushed the http-method-closed-set branch from 4d788ff to 1a2ebfc Compare May 5, 2023 20:04
semantic_conventions/http-common.yaml Outdated Show resolved Hide resolved
semantic_conventions/http-common.yaml Outdated Show resolved Hide resolved
semantic_conventions/http-common.yaml Outdated Show resolved Hide resolved
@lmolkova lmolkova marked this pull request as ready for review May 5, 2023 20:35
@lmolkova lmolkova requested review from a team May 5, 2023 20:35
@Oberon00
Copy link
Member

Oberon00 commented May 6, 2023

One thing to note is that this will move us away from both the old conventions and ECS: https://www.elastic.co/guide/en/ecs/current/ecs-http.html#field-http-request-method

What are your thoughts on instead simply adding a warning that the HTTP method cannot be relied upon to have low cardinality?

If we go with this new solution, for stabilization, should wording be added like "treatment of non-well-known HTTP methods is experimental"?

@arminru arminru added semconv:HTTP area:semantic-conventions Related to semantic conventions spec:metrics Related to the specification/metrics directory spec:trace Related to the specification/trace directory labels May 8, 2023
@lmolkova lmolkova force-pushed the http-method-closed-set branch from f9921a9 to 789d5da Compare May 8, 2023 20:08
@lmolkova
Copy link
Contributor Author

lmolkova commented May 8, 2023

One thing to note is that this will move us away from both the old conventions and ECS: https://www.elastic.co/guide/en/ecs/current/ecs-http.html#field-http-request-method

What are your thoughts on instead simply adding a warning that the HTTP method cannot be relied upon to have low cardinality?

If we go with this new solution, for stabilization, should wording be added like "treatment of non-well-known HTTP methods is experimental"?

great catch - I did not notice ECS allows any case to be used. I think doing this makes metrics and manual query experience worse. I.e. as a consumer, if I want to group by method name or show metrics for specific method on server side, I depend on my clients calling me with the right method. If I don't reject the wrong methods right away, I'd have to write my queries with this in mind.

So, if we document that http.*method is one of the standard headers (in common case), it should match exactly. And then we should either require instrumentations to normalize it or as @Oberon00 mentioned in the comment, map invalid case to OTHER

@AlexanderWert wdyt?

@reyang
Copy link
Member

reyang commented May 9, 2023

@lmolkova heads up - most likely this PR will be closed, and we'll ask you to resubmit the PR in a new repo, please refer to #3474 (comment).

@ruflin
Copy link

ruflin commented May 9, 2023

I tried to find the history around allowing all cases for http.request.method in ECS as I remember we had a very similar discussion, unfortunately I wasn't successful as there was a squash of many commits before we shipped 1.0 :-( If I remember correctly, the reasoning was that in Elasticsearch you can easily use a lower case token filter to get the expected result without requiring to check each even if it is accurate. Having said that, I think we should encourage everyone to use the RFC9110.

One other difference, here http.method is discussed, but in ECS it is http.request.method.

@Oberon00
Copy link
Member

Oberon00 commented May 9, 2023

@lmolkova

require instrumentations to normalize it

How would you normalize this? The HTTP method is case sensitive.

$ curl -X Get https://example.org
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
         "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
        <head>
                <title>501 - Not Implemented</title>
        </head>
        <body>
                <h1>501 - Not Implemented</h1>
        </body>
</html>

@epixa
Copy link

epixa commented May 9, 2023

ECS did used to require that http.request.method be lowercase for the same reasons outlined in this PR, but it turned out to be problematic because the original casing of the request is itself a point of data, one which can be helpful in the context of some security detection rules. ECS didn't even constrain the field to known values which could be even more lossy.

Since we already went through the cycle of limiting http.request.method and it impacted enough people negatively that we reverted it and made a breaking change in ECS to get to the current state, I recommend not repeating that cycle. It sounds like this is a problem for metrics use cases though, so perhaps a new field entirely to support the metrics use case is ideal?

@Oberon00
Copy link
Member

Oberon00 commented May 9, 2023

This reminds me that we have the same issue with the HTTP status code: open-telemetry/semantic-conventions#1056
I think the approach proposed here should also work for the method, i.e. add a new attribute http.request.restricted_method or similarly named, that will be filled out only for metrics (but could be filled out anywhere)

@lmolkova
Copy link
Contributor Author

lmolkova commented May 9, 2023

How would you normalize this? The HTTP method is case sensitive.

Apparently, some web frameworks don't think HTTP methods are case-sensitive.

E.g. Flask

> GeT / HTTP/1.1
> Host: localhost:5000
>
< HTTP/1.1 200 OK
< Server: Werkzeug/2.3.4 Python/3.10.7

or ASP.NET Core

> GeT / HTTP/1.1
> Host: localhost:5051
>
< HTTP/1.1 200 OK
< Server: Kestrel

while others (I tried Spring and express) consider them case-sensitive and return errors.

I can easily see how ASP.NET Core or Flask instrumentations, knowing that they consider methods to be case-insensitive, normalize Get to GET.

I added a note on case sensitivity which should not block such normalization, but that implies that in common case Get should be mapped to OTHER. I'm also proposing a new attribute http.request.original_method to capture the exact method passed by the client.

@epixa, @ruflin do you think this would be a reasonable solution?

Why I believe common http.request.method across all signals is beneficial

  • I assume that most HTTP clients produce standard methods only and mismatch is an edge case
  • Different attributes for traces and metrics make it harder for users to query data and for traces/logs-to-metrics scenarios

Essentially, we should not make the default experience worse to address an edge case.

@lmolkova
Copy link
Contributor Author

lmolkova commented May 9, 2023

@lmolkova heads up - most likely this PR will be closed, and we'll ask you to resubmit the PR in a new repo, please refer to #3474 (comment).

@reyang thanks for the heads up and no problem! We can continue the discussion here even if PR is closed and then I'll resumbit the result of it to the new repo.

@ruflin
Copy link

ruflin commented May 10, 2023

@lmolkova I agree mostly with your analysis above and especially "we should not make the default experience worse to address an edge case". But I'm not sure if we should introduce an additional field for it as part of the schema. Instead I think we should "strongly recommend" using everything upper case but allow the edge cases. As this is going to be used by different storage systems, we don't know if there is / can be automation in place to make it all upper case and having Get under OTHER sounds also confusing to me.

If we need http.request.original_method in addition, I would only make it a recommendation and not necessarily part of the schema. I'm in general concerned that the more fields there are, the harder it gets for others to adopt the schema and figure out what they should populate on what not. As a side note: There is a concept of core and extended fields in ECS which we eventually also need to think through in bringing all the fields in.

@Oberon00
Copy link
Member

But isn't the edge case here an HTTP client making a request in non-uppercase? This whole issue is about edge cases basically.

@lmolkova
Copy link
Contributor Author

lmolkova commented May 10, 2023

@ruflin

Instead I think we should "strongly recommend" using everything upper case but allow the edge cases. As this is going to be used by different storage systems, we don't know if there is / can be automation in place to make it all upper case and having Get under OTHER sounds also confusing to me.

the only place where uppercasing makes sense is the instrumentation which knows it's valid to uppercase. E.g. Get sent to node.js express application returns 400 - Bad Request. Uppercasing it would make telemetry invalid - GET request would return 200 and it'd be unclear why 400 was returned.

So we need to capture two pieces of information:

  • known method name or some version of OTHER. (Would UNKNOWN or INVALID resolve your concerns with Get captured as OTHER ?)
  • the actual method

If we capture method with the original casing, a malicious client can explode number of HTTP metrics and break grouping by span names which are also expected to be of low cardinality.
With this PR we have 10 possible values and with all the case variations, we'd have 400 possible values for method alone.

So, it seems we can't get away with just one attribute.

@lmolkova
Copy link
Contributor Author

I would only make it a recommendation and not necessarily part of the schema. I'm in general concerned that the more fields there are, the harder it gets for others to adopt the schema and figure out what they should populate on what not. As a side note: There is a concept of core and extended fields in ECS which we eventually also need to think through in bringing all the fields in.

I believe requirement_level is similar to core/extendedconcept.
Here the http.request.method is required, and http.request.original_method is recommended only when http.request.method is set to OTHER, which should be quite rare.

@ruflin
Copy link

ruflin commented May 11, 2023

I think we approach this from 2 different angles here. My focus is on the logs use case where http.request.method somehow shows up in a log message and is extracted. It can be a log message where it is extracted with dissect / grok or also a JSON log where it shows up as http.request.method already but for example has the value GeT. I would expect this value is still accepted from a JSON log and not rejected even though it is not according to the standard.

@Oberon00
Copy link
Member

Oberon00 commented May 11, 2023

Instead of adding original_method with the unaltered method, I'd suggest adding method_kind with the normalized method and keeping method aligned with ECS and old OTel. Then we only need an adjustment to the span name wording to be generated like method_kind, and probably requirement levels for metrics. This would also align with the proposed http.status_code_class from open-telemetry/semantic-conventions#1056

@lmolkova
Copy link
Contributor Author

lmolkova commented May 11, 2023

@Oberon00 method_kind sounds confusing - there is no kind, it's a standard method value as it is. It does not group different methods. I also don't think http.status_code_class is similar - even if introduced, it merges several valid codes into one value.

@ruflin this spec applies to the side that collects telemtery. It explicitly allows custom values, so backends should not reject anything based on the value. Assuming someone calculates metrics from logs, they can do sanitization there and still map GeT to GET or OTHER as they see fit.

Having said this, we don't have semantic conventions for HTTP logs and if we had, http.request.method should still be set consistently and have low cardinality for all signals (for correlation and logs-to-metrics). And http.request.original_method can still have a raw value.

@Oberon00
Copy link
Member

I get your argument, but I'm leaning towards having the "default" name be the actual method, and the normalized method in a new argument. If method_kind is not a good name, it could be normalized_method, restricted_method, method_bucket, method_category, indexable_method, etc.

Main argument for this way being that IMHO it's the least surprising way.

@lmolkova
Copy link
Contributor Author

@Oberon00 I hear your point. My concern here is:

  • we'd then have some version of http.request.normalized_method in the HTTP conventions for traces and metrics.
  • http.request.method will be used in edge cases only - this would be even more surprizing

I.e. we reserve a 'good' attribute to represent an invalid method and use a more complicated attribute for most-common case.
The alternative is to populate both, but then we're duplicating it for the absolute majority of users.
Again, we're prioritizing edge case with this approach.

@epixa
Copy link

epixa commented May 11, 2023

Pardon if I'm misrepresenting anyone here, but I think the conflict is perhaps originating in the different use cases we're advocating for, which might just be an emerging challenge we'll continue to face as we progress with the effort of aligning/merging/adopting ECS. Coming from the SIEM/security side rather than metrics, I don't see the original http method as an edge case... The primary consideration in my mind is accurately reflecting how data is logged and enriching it with additional information where possible that can make it more valuable.

Performing lossy operations like normalizing request methods is certainly OK, but it is additional information (i.e. the context of which "canonical" method the request represents), so it should be treated as an additional field.

We don't have to look very far for real world evidence of this similar issue causing problems for security use cases on logging data - it caused problems for enough people using ECS that we performed a breaking change to revert the normalization recommendation, and breaking changes are something we've tried hard to avoid.

@tsloughter
Copy link
Member

I'd make http.request.original_method a recommendation only for logs. That messes with the goal of being able to create only 1 set of attributes and being able to include that same value in all 3 signals, but still seems like the best solution to surface this information in the appropriate signal.

@ruflin
Copy link

ruflin commented May 12, 2023

@ruflin this spec applies to the side that collects telemtery.

I want to double down on what @epixa mentioned above. The goal of ECS was and now I assume of the otel spec is to have a common schema for ALL signals. Are we all aligned on this goal? If not, we should take it to a separate thread to dig deeper into it as otherwise this will keep popping up.

Getting all 3 signals together in a single schema will mean tradeoffs as there are conflicting priorities but I'm strongly convinced, finding the right trade offs will benefit the users that want to bring all the signals together and only have to remember one schema for querying data.

On my end, I'm getting back to the proposal of having a strong recommendation for the values of the fields but be lenient in accepting other values. Now a tool implementing this could always be stricter if needed to for example prevent a malicious client.

@lmolkova I would like to learn more about how this schema directly affects the receiving of events / preventing malicious values. Can you point me to where this is implemented?

@Oberon00
Copy link
Member

Oberon00 commented May 12, 2023

That messes with the goal of being able to create only 1 set of attributes and being able to include that same value in all 3 signals

Do you see offering two separate attributes for actual vs normalized method as messing with that goal? It would still be one set of attributes, and any of them could be used for any signal. Yes, it adds some "different ways of doing mostly the same thing", which is also not ideal, but as I see it we have these options:

  1. Only report actual method -- means that backends with weaker grouping capabilities may get flooded by malicious/broken clients with different methods. Given the latest new insights, we would need to add a warning to the spec that this cannot be relied upon to be low cardinality and adapt the HTTP span naming recommendations.
  2. Only report normalized method -- means information loss for especially interesting requests and if not done right esp. with upper/lowercase, could lead to confusing situations of overnormalization. Complex to implement on the instrumentation side, since configurability of the list of accepted methods is (IMHO) a basic requirement. I believe we won't get very far with this before someone requests to see the original method and then we are at 3.
  3. Report both or give user choice -- most complicated to configure & specify & implement, but can offer both flooding protection for "vulnerable" backends and full insights on requests with weird or unusual methods.

What I don't understand is why the actual method would only be recommended to be retained on logs. IMHO, metrics are the outlier here were high cardinality is known to cause problems. For traces, I would especially recommend to retain the original method, since that's what you would look at, in addition to logs, to debug outlier requests.

@ruflin
Copy link

ruflin commented May 12, 2023

Excellent summary @Oberon00

Do you see offering two separate attributes for actual vs normalized method as messing with that goal?

No, and to be fair, in ECS there are other places like user_agent.original where we did exactly the same. My general concern is, adding fields is easy, removing / changing fields will be hard and the more fields there are, the harder it gets for devs implementing the fields and consumers of the fields to remember these. I see a potential we will land with the original solution in many places which I don't consider ideal.

On the options proposed by @Oberon00 above, I would rule out 2 which leaves out 1 and 3. My preference is on 1 but I'm biased here from the perspective of having Elasticsearch as the storage engine as it can handle this scenario well.

Copy link
Member

@Oberon00 Oberon00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the discussion we already have, I believe this PR is currently inadequate as it lacks even the possibility of reporting the actual method used.
Neither does it require configurability (IMHO should be a MUST requirement, but that part would not be blocking if there was an alternative attribute)

semantic_conventions/http-common.yaml Show resolved Hide resolved
@Oberon00 Oberon00 dismissed their stale review May 12, 2023 09:47

Sorry, I overlooked the original_method in the diff. In that case, I won't block the PR

@tsloughter
Copy link
Member

@Oberon00 a bit, it would mean creating an additional attribute set in your code that includes the attribute. Not a big deal mostly since the other attributes in the set would likely be shared internally, but it would technically mean you need to add an additional attribute when creating the set for the logs.

You are right it is mainly metrics that would be best not to include it, I just mainly see its usefulness to analysis to be in logs.

Copy link
Contributor

@jsuereth jsuereth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lmolkova
Copy link
Contributor Author

@epixa @ruflin @Oberon00 @AlexanderWert
thank you for all the points, I moved the PR to open-telemetry/semantic-conventions#17. Let's continue the discussion there.

And thanks to @epixa for joining HTTP SIG meeting and clarifying concerns from the ECS side!

@lmolkova lmolkova closed this May 12, 2023
@lmolkova
Copy link
Contributor Author

@ruflin

The goal of ECS was and now I assume of the otel spec is to have a common schema for ALL signals. Are we all aligned on this goal? If not, we should take it to a separate thread to dig deeper into it as otherwise this will keep popping up.

That's a great discussion to have! Yes, I believe the goal is to have a consistent story across all signals. However, it might mean different things. E.g. metrics can only have low-cardinality attributes and traces need to have certain attributes to sufficiently describe the underlying stack (DB, HTTP, etc).

We definitely need your expertise to understand how logs semantic conventions might look like for HTTP (and/or security) and other technologies. And in general, to understand what exactly this common schema across signals would mean.

We have a couple of OTel spec meetings where we can discuss things like that, please check out https://github.com/open-telemetry/community#specification-sigs:

  • Semantic Conventions Working Group (every two weeks at 10 am PST on Monday)
  • Specification: General (every Tuesday at 8am)

Would be great to see you, @epixa, @AlexanderWert and anyone else from Elastic there.

@lmolkova
Copy link
Contributor Author

lmolkova commented May 12, 2023

@lmolkova I would like to learn more about how this schema directly affects the receiving of events / preventing malicious values. Can you point me to where this is implemented

OpenTelemetry does not advise or regulate how to consume the data. Currently, we only document the collection side and here's what we enforce on the semantic conventions.

This applies to semantic conventions and indirectly to the instrumentation libraries. Outside of OTel-owned repos there is no mechanism to enforce the compliance.

Users are free to collect data in any way or alter auto-collected telemetry with in-process span or log processors, metric views or with collector processors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:semantic-conventions Related to semantic conventions semconv:HTTP spec:metrics Related to the specification/metrics directory spec:trace Related to the specification/trace directory
Projects
None yet
Development

Successfully merging this pull request may close these issues.

What to do with non well-known http.method values?
10 participants