Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graceful Exit Failure When Using Gateway and RPC Services Simultaneously within ServiceGroup #4261

Open
stonever opened this issue Jul 19, 2024 · 5 comments
Assignees

Comments

@stonever
Copy link

stonever commented Jul 19, 2024

Describe the bug
I am encountering an issue where services fail to exit gracefully when both gateway and RPC services are used together within a ServiceGroup. The problem does not occur when the gateway service is removed from the group.
To Reproduce
Steps to reproduce the behavior, if applicable:

	serviceGroup := service.NewServiceGroup()
	defer serviceGroup.Stop()
	rpcServer := zrpc.MustNewServer(c.RpcServer, func(grpcServer *grpc.Server) {
		pb.RegisterFlowServer(grpcServer, flowserver.NewFlowServer(svcCtx))
		reflection.Register(grpcServer)
	})
	serviceGroup.Add(rpcServer) 

	gw := gateway.MustNewServer(c.Gateway)
	serviceGroup.Add(gw)

	serviceGroup.Start()
	slog.Info("exiting")

Attempt to stop the services gracefully by CTRL+C

  1. The error is

    the system does not print out the "Exiting" message that indicates a successful graceful shutdown process.

Expected behavior
print out the "Exiting" message

Environments (please complete the following information):

  • OS: [e.g. Linux]
  • go-zero version [e.g. 1.6.6]

More description
Through debugging, I have identified two key issues that prevent graceful shutdown:

Gateway WaitGroup Blockage:
The gateway service appears to block on a waitgroup because not all of its shutdown listeners (registered via proc) have completed their shutdown procedures. This results in the gateway waiting indefinitely for these listeners to finish, preventing a graceful exit.
RPC Service Connection Hang:
The RPC service is unable to exit gracefully due to a persistent connection that refuses to close. This connection is associated with the gateway service, suggesting a possible deadlock scenario where each service is waiting on the other to complete its shutdown procedure. This cyclic dependency prevents both services from terminating properly.
These insights indicate that there might be a coordination issue between the gateway and RPC services during the shutdown process, possibly due to improper handling of shutdown signals or synchronization primitives like waitgroups.

@kevwan kevwan self-assigned this Jul 19, 2024
@kevwan
Copy link
Contributor

kevwan commented Jul 22, 2024

I use the following code, didn't reproduce this problem.

func main() {
	flag.Parse()

	var c config.Config
	conf.MustLoad(*configFile, &c)

	group := service.NewServiceGroup()
	gw := gateway.MustNewServer(c.Gateway)
	group.Add(gw)

	ctx := svc.NewServiceContext(c)
	s := zrpc.MustNewServer(c.RpcServerConf, func(grpcServer *grpc.Server) {
		pb.RegisterGreetServer(grpcServer, server.NewGreetServer(ctx))
		reflection.Register(grpcServer)
	})
	group.Add(s)

	fmt.Printf("Starting rpc server at %s...\n", c.ListenOn)
	group.Start()
}

Would you please give me the full code on this issue?

@stonever
Copy link
Author

awesomeProject.zip
I wrote a minimal demo; pls take a look. @kevwan
Log:

^C{"@timestamp":"2024-07-25T15:28:12.737+08:00","caller":"proc/shutdown.go:58","content":"Got signal 2, shutting down...","level":"info"}
{"@timestamp":"2024-07-25T15:28:13.738+08:00","caller":"service/servicegroup.go:53","content":"Shutting down services in group","level":"info"}
{"@timestamp":"2024-07-25T15:28:13.738+08:00","caller":"stat/metrics.go:210","content":"(gateway) - qps: 0.0/s, drops: 0, avg time: 0.0ms, med: 0.0ms, 90th: 0.0ms, 99th: 0.0ms, 99.9th: 0.0ms","level":"stat"}
{"@timestamp":"2024-07-25T15:28:13.738+08:00","caller":"stat/metrics.go:210","content":"(flow.rpc) - qps: 0.0/s, drops: 0, avg time: 0.0ms, med: 0.0ms, 90th: 0.0ms, 99th: 0.0ms, 99.9th: 0.0ms","level":"stat"}
{"@timestamp":"2024-07-25T15:28:42.738+08:00","caller":"proc/shutdown.go:65","content":"Still alive after 30s, going to force kill the process...","level":"info"}

@kevwan
Copy link
Contributor

kevwan commented Aug 5, 2024

Thanks for your demo code!

I found that in zrpc, when we use server.GracefulStop(), it blocks. While using server.Stop() works.

I'm digging into it.

@stonever
Copy link
Author

@kevwan Any update? Thx

@kevwan
Copy link
Contributor

kevwan commented Aug 27, 2024

Still working on it. Get back to you when I have more progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants