Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grpc time out for 15 minutes #7587

Closed
ptlanu22 opened this issue Nov 3, 2020 · 7 comments
Closed

Grpc time out for 15 minutes #7587

ptlanu22 opened this issue Nov 3, 2020 · 7 comments
Labels

Comments

@ptlanu22
Copy link

ptlanu22 commented Nov 3, 2020

Hey, i am using mentioned version of grpc libraries. so we have 2 services running on grpc , service A create a managed channel during startup and uses it throughout lifetime of pod. service A receives heavy load (100 req/s) . sometimes we keep receiving timeout error for around 15 min , no call reaches service B. after 15 min automatically this resolves, after doing some browsing i checked this could be becasue of TCP_USER_TIMEOUT being set to around 15 min by default. so can someone confirm if this is the same issue, or something else which i am missing

			<groupId>io.grpc</groupId>
			<artifactId>grpc-stub</artifactId>
			<version>1.20.0</version>
		</dependency>
		<dependency>
			<groupId>io.grpc</groupId>
			<artifactId>grpc-core</artifactId>
			<version>1.20.0</version>
		</dependency>
		<dependency>
			<groupId>io.grpc</groupId>
			<artifactId>grpc-netty-shaded</artifactId>
			<version>1.20.0</version>
		</dependency>```

Below is the code used to create channel
` ManagedChannel managedChannel = ManagedChannelBuilder.forAddress(host, port)
        .usePlaintext()
        .build();
    stub = XServiceGrpc.newBlockingStub(managedChannel).withWaitForReady();`
@ejona86
Copy link
Member

ejona86 commented Nov 3, 2020

This was posted to Gitter as well, and I was answering it there. In short, v1.20.0 is unsupported as it is a year and a half old. But it seems you can try enabling keepAliveTime on the managedChannelBuilder to 5 minutes or so and see if that changes the behavior.

@MrCookie0313
Copy link

grpc调用go服务,每次生产上只要go服务端一更新pod,就会出现15分钟左右请求失败,后面自然就好了,问下是同一个问题吗

@MrCookie0313
Copy link

@MrCookie0313
Copy link

嘿,我正在使用提到的grpc库版本。因此,我们在grpc上运行了2个服务,服务A在启动期间创建了一个托管通道,并在pod的整个生命周期中都使用它。服务A承受重负载(100 req / s)。有时我们会持续接收约15分钟的超时错误,没有呼叫到达服务B。15分钟后,这自动解决,经过一些浏览后,我检查了一下,这可能是因为TCP_USER_TIMEOUT默认设置为约15分钟。所以有人可以确认这是否是同一问题,还是我所缺少的其他东西

			<groupId>io.grpc</groupId>
			<artifactId>grpc-stub</artifactId>
			<version>1.20.0</version>
		</dependency>
		<dependency>
			<groupId>io.grpc</groupId>
			<artifactId>grpc-core</artifactId>
			<version>1.20.0</version>
		</dependency>
		<dependency>
			<groupId>io.grpc</groupId>
			<artifactId>grpc-netty-shaded</artifactId>
			<version>1.20.0</version>
		</dependency>```

Below is the code used to create channel
` ManagedChannel managedChannel = ManagedChannelBuilder.forAddress(host, port)
        .usePlaintext()
        .build();
    stub = XServiceGrpc.newBlockingStub(managedChannel).withWaitForReady();`

Different cluster

grpc.clients.my-grpc-server-app.address=dns:///my-grpc-server-app.example.svc.cluster.local:1234 如果是域名可以参考这个加上dns:///

@ptlanu22
Copy link
Author

ptlanu22 commented Nov 6, 2020

@ejona86 i was going through this blog https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die/
hence i was suspecting the behaviour since timeout window was similar to 15 minutes as per blog,
And also this was included in grpc-java. https://github.com/grpc/proposal/blob/master/A18-tcp-user-timeout.md

If i set keepAliveTime to 5 minutes how that will solve the problem.? if i have continuous traffic every second.
also i am using java at both end client and server. and inside kuberrnetes cluster. connecting to server using ingress url.

@ejona86
Copy link
Member

ejona86 commented Nov 9, 2020

Enabling keepAliveTime enables an http2-level PING which would detect the broken connection, and it also enables TCP_USER_TIMEOUT.

@ejona86
Copy link
Member

ejona86 commented Nov 17, 2020

Seems like this is resolved. If not, comment, and it can be reopened.

@ejona86 ejona86 closed this as completed Nov 17, 2020
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants