Should we use HTTP 503 response for throttling? #26

tigrannajaryan · 2021-11-16T00:03:52Z

The spec says that the server can respond with 503 to throttle the client.

However, reading the http response code may not be possible with some WebSocket clients, e.g. for browsers (it is disabled for security reasons).

Should we still support this throttling method or avoid it?

pmm-sumo · 2021-11-18T16:57:27Z

Also, this is a slightly different reason why being throttled, but should we also mention HTTP 429 Too Many Requests next to 503? (I believe the behavior would be the same as for 503 otherwise)

tigrannajaryan · 2021-11-18T17:13:16Z

I could never figure out what is the use case for 429 when 503 achieves exactly the same from client's perspective. 429 is also supposed to return exactly the same Retry-After header. The slight nuance seems to be that 503 means server is overloaded and unable to respond properly, while 429 says that it could respond but will not because the client is exceed some sort of quota.

However, I can't think of anything that the Agent could be doing differently in response to 503 and 429, so I don't know what we gain by adding both to the spec. Perhaps the benefit is in helping the troubleshooting from the Agent side (the Agent can log the response code)?

tigrannajaryan · 2021-11-18T17:49:16Z

I experimented a bit with 503 responses in Go and JS.

In Go Websocket client (WSGorilla) the response code is clearly accessible, so no problem here, it can be used to indicate throttling.

In JS the 503 response results in an "error" event. This event is indistinguishable from a connection error when the server is down. This seems to be OK, since the protocol defines that if the connection cannot be established exponential backoff should be used, which essentially almost is what we also want to happen for 503 responses (minus the ability to specify exact retry interval). So, for JS it also seems to be OK to use this approach.

We do not have another good way to convey the unavailability of the server, unless we mandate the server to accept the Websocket connection and then send a ServerErrorResponse message with UNAVAILABLE error. However sending the ServerErrorResponse message is more expensive than just returning 503, the Server may simply not be able to do that.

The only alternate I can think of is just get rid of 503 response altogether and say that when overloaded the server simply should reject connections. But this would reduce the troubleshootability for other language implementations where the response code is easily accessible.

Given the above I think we should keep 503 response for throttling, since it does not cause problem for JS, is useful for troubleshooting compared to rejecting connection, and is less expensive than the ServerErrorResponse message.

Note that we still do need ServerErrorResponse message response because the overloading situation may occur after the WebSocket connection exists for a while. In this case we don't want to just disconnect since it will result in immediate re-connection, we also can't send an HTTP response code because it is too late, so we need to convey this information via a WebSocket message.

Any thoughts?

tigrannajaryan · 2021-11-18T18:05:24Z

Coming back to 429. This response is typically tied to a particular client or group of clients. It may not be possible to meaningfully calculate the rate and send a 429 response before the Agent's instance uid is known. That happens after the HTTP response headers are sent, and by then it is too late to try to send 429 response back.

So, in practice it may be impossible to have per-Agent rate limiting and 429 responses tied to that. We may still be able to tie this to auth information which supposedly is available before the WebSocket is established. This auth is another open issue that we need to answer before we can be sure about this.

pmm-sumo · 2021-11-18T18:14:53Z

Yeah, I believe that 503 is more universal and handling it would be enough. I am just wondering if the specification should be prescriptive whether 429 could be used or not. I imagine that in certain environments there might be some 429 throttling capabilities available already and having agent supporting that would be helpful, especially since the agent would behave the same way as for 503

tigrannajaryan · 2021-11-18T18:18:58Z

So perhaps we can just say that either 429 or 503 SHOULD be used for throttling and from protocol perspective there is no difference (the Agent SHOULD be ready to handle both, the Server can choose to use one or the other or both as applicable).

pmm-sumo · 2021-11-18T18:23:23Z

So perhaps we can just say that either 429 or 503 SHOULD be used for throttling and from protocol perspective there is no difference (the Agent SHOULD be ready to handle both, the Server can choose to use one or the other or both as applicable).

Yes, that's exactly what I had on mind. I can prepare a quick PR with that change

For WebSockets, I asked @kkruk-sumo for consultation (we used to use that technology extensively in one of the previous projects), maybe he will have some suggestions here

tigrannajaryan · 2021-11-18T18:28:38Z

Yes, that's exactly what I had on mind. I can prepare a quick PR with that change

Please do.

Contributes to open-telemetry#26

Contributes to #26

tigrannajaryan · 2021-11-18T19:26:56Z

I think #37 resolves this. Closing, can reopen if needed.

kkruk-sumo · 2021-11-18T21:43:40Z

Unfortunately, JS needs a special treatment here and if it's needed to be supported, I would consider to:

Accept connections and close them immediately with some arbitrary closing code, or,
Accept connections (if there are not open yet), send a json indicating the error and retry-after parameter and close them, or,
Close/reject connections and specify a minimum retry interval in the specification that JS clients needs to follow.

tigrannajaryan · 2021-11-19T15:00:34Z

@kkruk-sumo please clarify what you think will not work for JS with the spec as is. Returning 503 to JS results in a connection "error" event in the browser. The spec already defines what the client needs to do when to do when there is a connection error (exponential backoff). I think that should be sufficient for JS Agents. If you think otherwise please tell.

kkruk-sumo · 2021-11-23T10:11:49Z

@tigrannajaryan I believe that "The minimum recommended retry interval is 30 seconds." from the spec is enough for JS.

pmm-sumo added a commit to pmm-sumo/opamp-spec that referenced this issue Nov 18, 2021

Support both HTTP 503 and 429 for throttling

b23167e

Contributes to open-telemetry#26

pmm-sumo mentioned this issue Nov 18, 2021

Support both HTTP 503 and 429 for throttling #37

Merged

tigrannajaryan pushed a commit that referenced this issue Nov 18, 2021

Support both HTTP 503 and 429 for throttling (#37)

fcdd78a

Contributes to #26

tigrannajaryan closed this as completed Nov 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should we use HTTP 503 response for throttling? #26

Should we use HTTP 503 response for throttling? #26

tigrannajaryan commented Nov 16, 2021 •

edited

Loading

pmm-sumo commented Nov 18, 2021

tigrannajaryan commented Nov 18, 2021

tigrannajaryan commented Nov 18, 2021

tigrannajaryan commented Nov 18, 2021

pmm-sumo commented Nov 18, 2021

tigrannajaryan commented Nov 18, 2021

pmm-sumo commented Nov 18, 2021

tigrannajaryan commented Nov 18, 2021

tigrannajaryan commented Nov 18, 2021

kkruk-sumo commented Nov 18, 2021

tigrannajaryan commented Nov 19, 2021

kkruk-sumo commented Nov 23, 2021

Should we use HTTP 503 response for throttling? #26

Should we use HTTP 503 response for throttling? #26

Comments

tigrannajaryan commented Nov 16, 2021 • edited Loading

pmm-sumo commented Nov 18, 2021

tigrannajaryan commented Nov 18, 2021

tigrannajaryan commented Nov 18, 2021

tigrannajaryan commented Nov 18, 2021

pmm-sumo commented Nov 18, 2021

tigrannajaryan commented Nov 18, 2021

pmm-sumo commented Nov 18, 2021

tigrannajaryan commented Nov 18, 2021

tigrannajaryan commented Nov 18, 2021

kkruk-sumo commented Nov 18, 2021

tigrannajaryan commented Nov 19, 2021

kkruk-sumo commented Nov 23, 2021

tigrannajaryan commented Nov 16, 2021 •

edited

Loading