Skip to content

Commit

Permalink
server: wait for comet node shutdown
Browse files Browse the repository at this point in the history
cometbft's service Start() methods do not block while running,
so we the errgroup.Go for this was returning almost right away.
This meant that errgroup.Wait() did not wait for shutdown to
complete. The effect was frequent errors from cometbft arising
from one of our databases / key stores being shutdown too soon.

This is now resolved and the dependencies `closers` aren't closed
until the node is actually stopped.

However, there is still an internal cometbft bug during block sync
(before the nodes switches to consensus mode) because of an
unsupervised goroutine, `blocksync.(*Reactor).poolRoutine`. I fixed
this bug in blocksync/reactor.go and the errors go away. We may
need to PR cometbft to get clean shutdown during block sync.
  • Loading branch information
jchappelow committed Dec 20, 2023
1 parent b1e01ad commit 4bfbecf
Showing 1 changed file with 12 additions and 9 deletions.
21 changes: 12 additions & 9 deletions cmd/kwild/server/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -167,17 +167,20 @@ func (s *Server) Start(ctx context.Context) error {
s.log.Info("grpc server started", zap.String("address", s.cfg.AppCfg.AdminListenAddress))

group.Go(func() error {
go func() {
<-groupCtx.Done()
s.log.Info("stop comet server")
if err := s.cometBftNode.Stop(); err != nil {
s.log.Warn("failed to stop comet server", zap.Error(err))
}
}()
// The CometBFT services do not block on Start().
if err := s.cometBftNode.Start(); err != nil {
return err
}
s.log.Info("comet node is now started")

return s.cometBftNode.Start()
<-groupCtx.Done()
s.log.Info("stop comet server")
if err := s.cometBftNode.Stop(); err != nil {
return fmt.Errorf("failed to stop comet server: %w", err)
}
s.log.Info("comet server is stopped")
return nil
})
s.log.Info("comet node started")

err := group.Wait()

Expand Down

0 comments on commit 4bfbecf

Please sign in to comment.