Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cache: Clean up temporary mount pool on restart #2652

Merged
merged 1 commit into from
Feb 23, 2022

Conversation

ktock
Copy link
Collaborator

@ktock ktock commented Feb 17, 2022

Following-up #2637 (comment)

Currently, if buildkitd crashes before unmounting temporary (writable) overlayfs mounts, it could leak these mounts.
Then if buildkitd started up again, same issue of duplicating overlay mounts can happen because the old mounts remain on the host.

This commit fixes this issue by introducing a directory cachemounts under /var/lib/buildkit/ for storing all temporary overlay mounts.
When buildkitd starts, it cleans up all existing (old) mounts under that directory so we can avoid the issue of duplicating overlay mounts.

@ktock ktock requested a review from sipsma February 17, 2022 08:13
Copy link
Collaborator

@sipsma sipsma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, one comment about a corner case.

Also, if it's feasible it would be nice to have a test of this behavior. Unfortunately I don't think the client integ tests would support something like sending SIGKILL to buildkitd and then restarting it, but even something in the unit tests would be helpful.

cache/manager.go Outdated
dir := filepath.Join(opt.MountPoolRoot, file.Name())
logrus.Debugf("cleaning up existing temporary mount %q", dir)
if err := mount.Unmount(dir, 0); err != nil {
logrus.WithError(err).Warnf("failed to unmount existing temporary mount %q", dir)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if the host experienced a sudden power off then when buildkitd starts up again you could end up with dirs left here but no mounts. In this case Unmount would always fail but since we continue the dir would never be removed and this log would be printed every startup forever. Maybe it's better to check the err and try removing the dir if the error indicates there wasn't a mount there.

Also, nit about using bklog for these statements.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review.

I guess if the host experienced a sudden power off then when buildkitd starts up again you could end up with dirs left here but no mounts. In this case Unmount would always fail but since we continue the dir would never be removed and this log would be printed every startup forever.

containerd's mount.Unmount ignores EINVAL which can is returned when the dir isn't a mount point. So I think it won't be occur.

Also, nit about using bklog for these statements.

Fixed to use bklog.

Also, if it's feasible it would be nice to have a test of this behavior. Unfortunately I don't think the client integ tests would support something like sending SIGKILL to buildkitd and then restarting it, but even something in the unit tests would be helpful.

Thank you for the suggestion.
Added an unit test.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

containerd's mount.Unmount ignores EINVAL which can is returned when the dir isn't a mount point. So I think it won't be occur.

Ah nice, good to know. Thanks for updating.

@ktock ktock force-pushed the sharemounts-cleanup branch 2 times, most recently from 5ae5d93 to 856bed2 Compare February 18, 2022 04:49
Copy link
Collaborator

@sipsma sipsma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more logrus, otherwise LGTM

cache/refs.go Outdated
for _, file := range files {
if file.IsDir() {
dir := filepath.Join(tmpdirRoot, file.Name())
logrus.Debugf("cleaning up existing temporary mount %q", dir)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still logrus here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review. fixed this.

@tonistiigi
Copy link
Member

@ktock Needs rebase

Signed-off-by: Kohei Tokunaga <ktokunaga.mail@gmail.com>
@ktock
Copy link
Collaborator Author

ktock commented Feb 23, 2022

Thank you.
Rebased.

@tonistiigi tonistiigi merged commit f5a831c into moby:master Feb 23, 2022
@ktock ktock deleted the sharemounts-cleanup branch February 28, 2022 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants