-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reverseproxy: do not close streams when explicitly configured #5637
Conversation
🤔 I'm not sure why we would want to allow this, honestly. Streams never closing means the reference to the old config is never dropped. If anything, it should be set to a big number to allow for naturally closing, not |
But large timeouts will mean the timer and associated config be referenced during the timeout. Besides, user will have to configure if themselves. |
But with a timer you have a guarantee that memory will be cleaned up at some point. With -1 you have no guarantee and you're essentially asking for memory leaks. Definite foot-gun IMO. Do you have an actual usecase for this? What problem would this solve? I don't think we should add this until we have a good reason to allow it. |
I mean, the more I think about it, the more I wonder whether we need that timeout. Especially since we're now allowing streams to survive the config reloads. Since it's just a stream proxy, either the client or the backend is really responsible for terminating the stream. |
Huh? We absolutely do need the close delay. That was the whole point of #5567. It's to provide a way to keep the connections alive for a certain amount of time past config reloads to allow connections to close naturally rather than being forcefully closed, if possible, but still guarantee they're all cleaned up and memory is properly garbage collected. |
🤔 . Seems it already handles when all connections are closed before timer fires. So large timeouts will not cause timer to linger for more than necessary. Closing now unless a situation where big timeout can't solve occurs. |
Sorry; to clarify, let me summarize:
I actually kind of wish there was a way for the proxy, the backend, and the frontend to communicate, like, for the proxy to say: "Hey, I need to close this. If you still need this, can you re-establish a new connection without throwing a fit?" And the frontend could be like "Yeah bro, I'm cool wit dat," and the server could be like, "Sure. Whatever." Anyway, that's probably wishful thinking. So, maybe in 2.7, the situation should be:
Does that make sense? |
I totally disagree with this. The frontend and backend have no incentive to close the connection to make any proxy happy. They'll keep it open as long as reasonably possible, they're greedy. For that reason, as a proxy, we can't trust that either of them will ever close the connection. Imagine a (not-quite-worst-case) scenario where Caddy is reloaded say every five minutes, but websocket connections can stay open for say 24 hours at worst. If every time the config is reloaded, a new websocket connection comes in and stays open 24 hours, then that's 12 per hour -> 288 per day. So at worst, Caddy could be forced to keep 288 copies of its config in memory. That's obviously ridiculous and could cause OOM etc depending on what else is in the config. (Also, yes a websocket connection could be open 24hrs+, for example if a user opens a browser tab then just leaves it sitting in the background, it'll never disconnect unless forced to, and will probably just be sending heartbeats back and forth forever as long as the connection stays healthy. I see it all the time in my apps.)
That is what we're doing though. That's why we implemented the websocket close frame. Clients are expected to reconnect when that happens. But that's still disruptive to user experience and can cause thundering herds if all the reconnects are at the exact same time. There's no other viable option at our disposal. There's no standard that exists for allowing a proxy to inject information into websocket streams. The traffic can be raw binary (protobuf or w/e), it can be JSON text, etc. We can't reasonably try to invent something either, that's not our place. Who would actually want to implement a Caddy-specific websocket traffic thing in their frontends and backends? I certainly don't, as someone who uses Caddy for this particular thing.
Yep, still the case.
Yep, that's what we have. I called it a "stream close delay", because there's we also added a "stream timeout" which is an absolute timeout from the point where the stream is opened, rather than from when the server wanted to try to close it.
But again as I asked earlier, why? What's the usecase for this? Who would need this? What problem would it solve? I'm not seeing it at all. I don't think it makes sense to provide a config option that obviously and certainly will cause memory leaks. That seems counterproductive. |
@francislavoie Ok -- yeah, I see what you're saying. Let me try to explain a little more why this might not be a problem (always):
I don't think this is true of the backend. If it leaves all connections open forever, even if idle, it'll run out of resources too. It's accepting lots of client connections. So clients, sure -- they generally only have ~1 connection each. But servers have to scale resources to serve their clients.
I'm not actually sure that the entire config is in memory just because the connection is open. Something or a part of it is definitely in memory (like the reverse proxy part), but I don't know if the whole thing is. I'd have to do some GC analysis, but I'm not quite sure how to do that. 🤔
Ok yes, valid questions. I think, less so than use case and market, it's more of properly aligning the roles of the proxy in relation to the backend. I personally have mixed feelings about what the "right" answer is, so I think it should be configurable. We should probably be able to trust the backend. At least, just enough to trust that it will close connections when no longer needed to avoid DoS'ing itself. IMO the proxy's only job is to proxy -- relay the connection from the frontend to the backend. So while I still think that in the default case, a config reload should probably clean up old connections so they aren't using the old config anymore. But we know that that's not suitable for everyone. Ok, so we let them configure some timeouts which can cause connections to survive the reloads. I believe if the site owner thinks their backend will handle connections properly, there is good reason to not want the proxy to arbitrarily drop them. Honestly, if I was proxying websockets, I would probably use the -1 (no timeout) setting. |
Yeah I agree a properly implemented backend would have an absolute lifetime for their sockets as well and close way-too-old or left-hanging websockets, but there's nothing that dictates that they must. We can't make the assumption that every app does.
Using
That's the thing, it is configurable. Just set a reasonably long delay. Prevents it from being immediately closed, and guarantees that Caddy's memory usage will be kept in check. Best of both. Using |
You're right, I agree -- which is why this won't be the default (for now).
I thought the problem before was that if one side closed the connection we didn't also close the other side? 🤔 Maybe I am remembering that wrong. But if I'm right, then that old behavior was definitely a leak, and the -1 value is still different, since we'll still close the other side when one side closes.
I can't argue that memory management is a good thing, but I will say that timeouts -- though perhaps better than nothing -- are kind of a hack in this case. Time doesn't correlate to memory use. You can't know how long it will take to fill up memory. So if you set too low of a timeout you'll bump clients needlessly. Too high and you'll run out of memory -- or your backend will first. The thing is, if Caddy holds onto lots of connections, so will the backend app. I feel like if you configure a -1, you want to optimize for client reliability because you know your backend will manage connections properly. Or maybe your clients are all authenticated, so you can trust them. I dunno. |
No, we did close both sides when either closed. We just didn't do anything when config was reloaded.
Yeah I agree, it's not perfect. But it is still better than nothing, and I think we should do as we usually do, and wait for a user with an actual valid usecase to ask for this if they actually need it, otherwise IMO we just add risky config for no reason. |
This is difficult logic to argue. I'll cede and we can wait for a feature request :) |
stream_close_delay
of -1 now means streams will not be closed by caddy.