Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

process hangs due to default socket_timeout=None #119

Closed
zioproto opened this issue Nov 7, 2024 · 3 comments · Fixed by #120
Closed

process hangs due to default socket_timeout=None #119

zioproto opened this issue Nov 7, 2024 · 3 comments · Fixed by #120

Comments

@zioproto
Copy link
Contributor

zioproto commented Nov 7, 2024

This is a recurring issue, previously reported in this GitHub issue.

I encountered the same problem while testing a Valkey Cluster on Kubernetes. When a Valkey Cluster pod is evicted, my client attempts to reconnect immediately, but gets stuck in the SYN_SENT state indefinitely, trying to connect to an IP address that no longer exists in the Kubernetes cluster. In Kubernetes, pods are ephemeral, and each new pod is assigned a new IP address.

From the ValkeyCluster documentation, it's not immediately clear that developers need to configure the TCP timeout. See the relevant section here. Furthermore, I don’t believe developers building applications that connect to Valkey should have to handle low-level TCP parameters.

Proposed solutions:

  1. Set a default socket_timeout value

    • Pros: This would solve the problem immediately.
    • Cons: It may be difficult to choose a default value that fits all scenarios. For example, what would be a reasonable default value for Valkey traffic? Perhaps 2 seconds?
  2. Make socket_timeout a mandatory parameter

    • Pros: Avoids the need to guess a suitable default value.
    • Cons: This would introduce a breaking change.
zioproto added a commit to zioproto/valkey-py that referenced this issue Nov 7, 2024
zioproto added a commit to zioproto/valkey-py that referenced this issue Nov 7, 2024
Fixes valkey-io#119

Signed-off-by: Saverio Proto <saverioproto@microsoft.com>
@aiven-sal
Copy link
Member

Hi!
Thanks for reporting the problem and providing a PR!

I'm not very keen on breaking compatibility, so I'd rule out solution 2.
But also solution 1 has a potential of being a breaking change, regardless of the value.
I'll need to dig a bit deeper to see how big of a problem this could be.

A solution 3 could be to improve the doc and make it very clear that socket_timeout is something that needs to be set.
I agree that exposing a low-level setting is not ideal, but there is no way have a "one size fits all" default here, so many users may want to set this themselves anyway. Whatever new default value we choose isn't necessarily better than the current one, in the general case.

@zioproto
Copy link
Contributor Author

zioproto commented Nov 7, 2024

@aiven-sal should we have a look what other Valkey / Redis client implementations ( even in different languages ) are doing when setting TCP timeouts?

For example I see the golang client uses a default value of 3 seconds for socket_timeout and 5 seconds for socket_connect_timeout:
https://github.com/redis/go-redis/blob/80c9f5bb777dd4e6393d014cffe9d28428ecf756/options.go#L83-L97

The Node Client is using 5 seconds:
https://github.com/redis/node-redis/blob/b87b8b113430eb5e860ab435dc84e141a2e8ae98/docs/client-configuration.md?plain=1#L11

It seems that not setting a default is not a good idea :(

@aiven-sal
Copy link
Member

aiven-sal commented Nov 7, 2024

I agree that having a default would better. 2, 3 or 5 they should all be fine (but maybe I'd pick 5 just to stay on the safe side)
I was just worried about the implications of changing the default: I think it's unlikely that someone out there is relying on the fact that the connection never times out, but not impossible.

But it's about time we make a 6.1 where we can introduce some breaking changes. So I think we can change this default to 5.

zioproto added a commit to zioproto/valkey-py that referenced this issue Nov 7, 2024
Fixes valkey-io#119

Signed-off-by: Saverio Proto <saverioproto@microsoft.com>
Rafiot pushed a commit to Rafiot/valkey-py that referenced this issue Dec 3, 2024
Fixes valkey-io#119

Signed-off-by: Saverio Proto <saverioproto@microsoft.com>
Signed-off-by: Raphaël Vinot <raphael@vinot.info>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants