Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restart helper: give container process some time to shutdown #2789

Merged
merged 1 commit into from
Jan 22, 2024

Conversation

maartendeprez
Copy link

What issue type does this pull request address? (keep at least one, remove the others)
/kind bugfix

What does this pull request do? Which issues does it resolve?
resolves #2788

Please provide a short message that should be published in the DevSpace release notes
The helper restart command would wait for 2 microseconds (2000 nanoseconds, supposedly meant to be 2 seconds) between sending SIGINT and SIGKILL to the subcommand. Changed this to send SIGINT, SIGTERM and SIGKILL with five seconds in between, checking for process termination.

What else do we need to know?

Copy link

netlify bot commented Jan 12, 2024

Deploy Preview for devspace-docs canceled.

Built without sensitive environment variables

Name Link
🔨 Latest commit e02449d
🔍 Latest deploy log https://app.netlify.com/sites/devspace-docs/deploys/65a4fabeae406a00087a6a58

Copy link
Collaborator

@lizardruss lizardruss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some ideas for reducing the repetition.

Comment on lines 78 to 85
for i := 0; i < 5; i++ {
time.Sleep(time.Second)
_, err := os.Stat(path)
if os.IsNotExist(err) {
stderrlog.Infof("Process terminated!")
return nil;
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use wait.PollImmediate here instead of a loop. It's already included in the helper it seems.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. According to the docs, wait.PollImmediate is deprecated, but the version already included in the helper does not have the replacement function yet, so i used it anyway. Would it be better to update the import?

Alternatively, we could wait for the process using the pidfd interface. Supposedly this would be slightly more efficient, but also, more importantly, it would avoid a race condition in case the pid would be reused between sending the signal and polling for termination. It would require an extra import and might make the helper somewhat less portable though (the syscall appeared in Linux 5.3).

func KillPidFd(pid int) {

        fmt.Printf("Opening process %d\n", pid)

        fd, err := pidfd.Open(pid, 0)
        if err != nil {
                fmt.Printf("Error: %s\n", err.Error())
                return
        }

        for _, sig := range []syscall.Signal{syscall.SIGINT, syscall.SIGTERM, syscall.SIGKILL} {
                fmt.Printf("Sending %s signal...\n", sig.String())

                err = fd.SendSignal(sig, 0)
                if err != nil {
                        fmt.Printf("Error: %s\n", err.Error())
                        return
                }

                fds := []unix.PollFd{{Fd: int32(fd), Events: unix.POLLIN}}
                n, _ := unix.Poll(fds, 5000 /* milliseconds */)

                if n > 0 {
                        fmt.Printf("Process terminated!\n")
                        return
                }
        }

        fmt.Printf("Timeout!\n")

}

Copy link
Collaborator

@lizardruss lizardruss Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maartendeprez

This is already a good improvement over the original code. I think we can wait to upgrade to the replacement for wait.PollImmediate, so that it can be upgraded outside the helper as well.

Regarding the pidfd interface, if the race condition has been observed in your testing, and happens frequently enough, then we can consider adding it. If it hasn't been observed then we can skip for now.

}
}

stderrlog.Infof("Sending SIGTERM...")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might consider moving the wait logic into a function. Something along the lines of:

stderrlog.Infof("Sending SIGINT...")
_ = syscall.Kill(-pgid, syscall.SIGINT)
if done, err := waitForTerminated(pgid); err != nil {
  return err
} else if done {
  return nil
}

Signed-off-by: Maarten Deprez <mdp@si-int.eu>
@maartendeprez maartendeprez force-pushed the helper-restart-timings branch from 69b7955 to e02449d Compare January 15, 2024 09:28
Copy link
Collaborator

@lizardruss lizardruss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Resolves ENG-2674

@lizardruss lizardruss merged commit af45484 into devspace-sh:main Jan 22, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Helper restart does not give container process the chance to shut down properly
2 participants