Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graceful Exit Failure When Using Gateway and RPC Services Simultaneously within ServiceGroup #4261

Closed
stonever opened this issue Jul 19, 2024 · 6 comments · Fixed by #4531
Closed
Assignees
Labels
kind/in-progress Issues and PRs that in progress

Comments

@stonever
Copy link

stonever commented Jul 19, 2024

Describe the bug
I am encountering an issue where services fail to exit gracefully when both gateway and RPC services are used together within a ServiceGroup. The problem does not occur when the gateway service is removed from the group.
To Reproduce
Steps to reproduce the behavior, if applicable:

	serviceGroup := service.NewServiceGroup()
	defer serviceGroup.Stop()
	rpcServer := zrpc.MustNewServer(c.RpcServer, func(grpcServer *grpc.Server) {
		pb.RegisterFlowServer(grpcServer, flowserver.NewFlowServer(svcCtx))
		reflection.Register(grpcServer)
	})
	serviceGroup.Add(rpcServer) 

	gw := gateway.MustNewServer(c.Gateway)
	serviceGroup.Add(gw)

	serviceGroup.Start()
	slog.Info("exiting")

Attempt to stop the services gracefully by CTRL+C

  1. The error is

    the system does not print out the "Exiting" message that indicates a successful graceful shutdown process.

Expected behavior
print out the "Exiting" message

Environments (please complete the following information):

  • OS: [e.g. Linux]
  • go-zero version [e.g. 1.6.6]

More description
Through debugging, I have identified two key issues that prevent graceful shutdown:

Gateway WaitGroup Blockage:
The gateway service appears to block on a waitgroup because not all of its shutdown listeners (registered via proc) have completed their shutdown procedures. This results in the gateway waiting indefinitely for these listeners to finish, preventing a graceful exit.
RPC Service Connection Hang:
The RPC service is unable to exit gracefully due to a persistent connection that refuses to close. This connection is associated with the gateway service, suggesting a possible deadlock scenario where each service is waiting on the other to complete its shutdown procedure. This cyclic dependency prevents both services from terminating properly.
These insights indicate that there might be a coordination issue between the gateway and RPC services during the shutdown process, possibly due to improper handling of shutdown signals or synchronization primitives like waitgroups.

@kevwan kevwan self-assigned this Jul 19, 2024
@kevwan
Copy link
Contributor

kevwan commented Jul 22, 2024

I use the following code, didn't reproduce this problem.

func main() {
	flag.Parse()

	var c config.Config
	conf.MustLoad(*configFile, &c)

	group := service.NewServiceGroup()
	gw := gateway.MustNewServer(c.Gateway)
	group.Add(gw)

	ctx := svc.NewServiceContext(c)
	s := zrpc.MustNewServer(c.RpcServerConf, func(grpcServer *grpc.Server) {
		pb.RegisterGreetServer(grpcServer, server.NewGreetServer(ctx))
		reflection.Register(grpcServer)
	})
	group.Add(s)

	fmt.Printf("Starting rpc server at %s...\n", c.ListenOn)
	group.Start()
}

Would you please give me the full code on this issue?

@stonever
Copy link
Author

awesomeProject.zip
I wrote a minimal demo; pls take a look. @kevwan
Log:

^C{"@timestamp":"2024-07-25T15:28:12.737+08:00","caller":"proc/shutdown.go:58","content":"Got signal 2, shutting down...","level":"info"}
{"@timestamp":"2024-07-25T15:28:13.738+08:00","caller":"service/servicegroup.go:53","content":"Shutting down services in group","level":"info"}
{"@timestamp":"2024-07-25T15:28:13.738+08:00","caller":"stat/metrics.go:210","content":"(gateway) - qps: 0.0/s, drops: 0, avg time: 0.0ms, med: 0.0ms, 90th: 0.0ms, 99th: 0.0ms, 99.9th: 0.0ms","level":"stat"}
{"@timestamp":"2024-07-25T15:28:13.738+08:00","caller":"stat/metrics.go:210","content":"(flow.rpc) - qps: 0.0/s, drops: 0, avg time: 0.0ms, med: 0.0ms, 90th: 0.0ms, 99th: 0.0ms, 99.9th: 0.0ms","level":"stat"}
{"@timestamp":"2024-07-25T15:28:42.738+08:00","caller":"proc/shutdown.go:65","content":"Still alive after 30s, going to force kill the process...","level":"info"}

@kevwan
Copy link
Contributor

kevwan commented Aug 5, 2024

Thanks for your demo code!

I found that in zrpc, when we use server.GracefulStop(), it blocks. While using server.Stop() works.

I'm digging into it.

@stonever
Copy link
Author

@kevwan Any update? Thx

@kevwan
Copy link
Contributor

kevwan commented Aug 27, 2024

Still working on it. Get back to you when I have more progress.

@kevwan
Copy link
Contributor

kevwan commented Jan 1, 2025

@kevwan Any update? Thx

I've tested with your demo. Please have a try as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/in-progress Issues and PRs that in progress
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants