Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should we use HTTP 503 response for throttling? #26

Closed
tigrannajaryan opened this issue Nov 16, 2021 · 12 comments
Closed

Should we use HTTP 503 response for throttling? #26

tigrannajaryan opened this issue Nov 16, 2021 · 12 comments

Comments

@tigrannajaryan
Copy link
Member

tigrannajaryan commented Nov 16, 2021

The spec says that the server can respond with 503 to throttle the client.

However, reading the http response code may not be possible with some WebSocket clients, e.g. for browsers (it is disabled for security reasons).

Should we still support this throttling method or avoid it?

@pmm-sumo
Copy link
Contributor

Also, this is a slightly different reason why being throttled, but should we also mention HTTP 429 Too Many Requests next to 503? (I believe the behavior would be the same as for 503 otherwise)

@tigrannajaryan
Copy link
Member Author

I could never figure out what is the use case for 429 when 503 achieves exactly the same from client's perspective. 429 is also supposed to return exactly the same Retry-After header. The slight nuance seems to be that 503 means server is overloaded and unable to respond properly, while 429 says that it could respond but will not because the client is exceed some sort of quota.

However, I can't think of anything that the Agent could be doing differently in response to 503 and 429, so I don't know what we gain by adding both to the spec. Perhaps the benefit is in helping the troubleshooting from the Agent side (the Agent can log the response code)?

@tigrannajaryan
Copy link
Member Author

I experimented a bit with 503 responses in Go and JS.

In Go Websocket client (WSGorilla) the response code is clearly accessible, so no problem here, it can be used to indicate throttling.

In JS the 503 response results in an "error" event. This event is indistinguishable from a connection error when the server is down. This seems to be OK, since the protocol defines that if the connection cannot be established exponential backoff should be used, which essentially almost is what we also want to happen for 503 responses (minus the ability to specify exact retry interval). So, for JS it also seems to be OK to use this approach.

We do not have another good way to convey the unavailability of the server, unless we mandate the server to accept the Websocket connection and then send a ServerErrorResponse message with UNAVAILABLE error. However sending the ServerErrorResponse message is more expensive than just returning 503, the Server may simply not be able to do that.

The only alternate I can think of is just get rid of 503 response altogether and say that when overloaded the server simply should reject connections. But this would reduce the troubleshootability for other language implementations where the response code is easily accessible.

Given the above I think we should keep 503 response for throttling, since it does not cause problem for JS, is useful for troubleshooting compared to rejecting connection, and is less expensive than the ServerErrorResponse message.

Note that we still do need ServerErrorResponse message response because the overloading situation may occur after the WebSocket connection exists for a while. In this case we don't want to just disconnect since it will result in immediate re-connection, we also can't send an HTTP response code because it is too late, so we need to convey this information via a WebSocket message.

Any thoughts?

@tigrannajaryan
Copy link
Member Author

Coming back to 429. This response is typically tied to a particular client or group of clients. It may not be possible to meaningfully calculate the rate and send a 429 response before the Agent's instance uid is known. That happens after the HTTP response headers are sent, and by then it is too late to try to send 429 response back.

So, in practice it may be impossible to have per-Agent rate limiting and 429 responses tied to that. We may still be able to tie this to auth information which supposedly is available before the WebSocket is established. This auth is another open issue that we need to answer before we can be sure about this.

@pmm-sumo
Copy link
Contributor

Yeah, I believe that 503 is more universal and handling it would be enough. I am just wondering if the specification should be prescriptive whether 429 could be used or not. I imagine that in certain environments there might be some 429 throttling capabilities available already and having agent supporting that would be helpful, especially since the agent would behave the same way as for 503

@tigrannajaryan
Copy link
Member Author

So perhaps we can just say that either 429 or 503 SHOULD be used for throttling and from protocol perspective there is no difference (the Agent SHOULD be ready to handle both, the Server can choose to use one or the other or both as applicable).

@pmm-sumo
Copy link
Contributor

So perhaps we can just say that either 429 or 503 SHOULD be used for throttling and from protocol perspective there is no difference (the Agent SHOULD be ready to handle both, the Server can choose to use one or the other or both as applicable).

Yes, that's exactly what I had on mind. I can prepare a quick PR with that change

For WebSockets, I asked @kkruk-sumo for consultation (we used to use that technology extensively in one of the previous projects), maybe he will have some suggestions here

@tigrannajaryan
Copy link
Member Author

Yes, that's exactly what I had on mind. I can prepare a quick PR with that change

Please do.

pmm-sumo added a commit to pmm-sumo/opamp-spec that referenced this issue Nov 18, 2021
tigrannajaryan pushed a commit that referenced this issue Nov 18, 2021
@tigrannajaryan
Copy link
Member Author

I think #37 resolves this. Closing, can reopen if needed.

@kkruk-sumo
Copy link

Unfortunately, JS needs a special treatment here and if it's needed to be supported, I would consider to:

  1. Accept connections and close them immediately with some arbitrary closing code, or,
  2. Accept connections (if there are not open yet), send a json indicating the error and retry-after parameter and close them, or,
  3. Close/reject connections and specify a minimum retry interval in the specification that JS clients needs to follow.

@tigrannajaryan
Copy link
Member Author

@kkruk-sumo please clarify what you think will not work for JS with the spec as is. Returning 503 to JS results in a connection "error" event in the browser. The spec already defines what the client needs to do when to do when there is a connection error (exponential backoff). I think that should be sufficient for JS Agents. If you think otherwise please tell.

@kkruk-sumo
Copy link

@tigrannajaryan I believe that "The minimum recommended retry interval is 30 seconds." from the spec is enough for JS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants