Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[batch] Check /proc/mounts for straggler cloudfuse mounts #12986

Merged
merged 6 commits into from
May 4, 2023

Conversation

jigold
Copy link
Contributor

@jigold jigold commented May 4, 2023

No description provided.

@@ -419,6 +419,9 @@ async def _fuse_mount(self, destination: str, *, credentials_path: str, tmp_path
async def _fuse_unmount(self, path: str):
assert CLOUD_WORKER_API
await CLOUD_WORKER_API.unmount_cloudfuse(path)
proc_mounts_stdout, _ = await check_shell_output('cat /proc/mounts')
if path in str(proc_mounts_stdout):
raise IncompleteCloudFuseCleanup(f'incomplete cloudfuse unmounting: {proc_mounts_stdout}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see now. This looks good, but can we just use python open instead of cat to read in the contents of /proc/mounts?

danking
danking previously requested changes May 4, 2023
with open('/proc/mounts', 'r') as f:
output = f.read()
if self.cloudfuse_base_path() in output:
raise IncompleteCloudFuseCleanup(f'incomplete cloudfuse unmounting: {output}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fairly sure exceptions raised in DockerJob.cleanup are not logged anywhere. Using the raise to skip the xfs_quota and rmtree seems fine but we need to log the exception here.

with open('/proc/mounts', 'r') as f:
output = f.read()
if path in output:
raise IncompleteCloudFuseCleanup(f'incomplete cloudfuse unmounting: {output}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would read this file for each mount rather than once after all attempts to unmount. Let's make JVMJob symmetric to DockerJob and check the mounts in cleanup in both Jobs. I think we should also make sure we attempt to upload the log before we check /proc/mounts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah true, there's no need to put this logic in the cloudfuse cache because we never rmtree things in here. The attack vector is any bind mount left over in a directory that we will subsequently rmtree, so the way to handle the JVMJob case is to check /proc/mounts at the end of a JVM job for any leftover mounts in the scratch directory.

Copy link
Contributor Author

@jigold jigold May 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we checking for paths in the scratch directory? Should I be checking for the scratch directory in Docker jobs as well?

@@ -2234,6 +2230,11 @@ async def cleanup(self):
await self.jvm.cloudfuse_mount_manager.unmount(mount_path, user=self.user, bucket=bucket)
config['mounted'] = False

with open('/proc/mounts', 'r') as f:
output = f.read()
if self.cloudfuse_base_path() in output:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my question is won't this fuse path remain alive if there's other jobs reading the data? That's why I had the check after we actually call fusermount.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a fuse path, it's a bind mount path which is the thing that the JVMJob actually owns. Here's an example:

  1. JVM 1 wants bucket foo at /cloudfuse/JVM-1/foo
  2. FUSE cache manager makes two mounts:
  • FUSE mount at /cloudfuse/read_only/abc123/foo
  • Bind mounts /cloudfuse/read_only/abc123/foo to /cloudfuse/JVM-1/foo
  1. JVM 2 wants bucket foo at /cloudfuse/JVM-2/foo
  2. FUSE cache manager reuses the underlying FUSE mount and bind mounts /cloudfuse/read_only/abc123/foo to /cloudfuse/JVM-2/foo.
  3. JVM 1 finishes and wants to clean up, it calls cloudfuse_mount_manager.unmount
  4. FUSE cache manager unmounts the bind mount at /cloudfuse/JVM-1/foo. The underyling FUSE mount is still being used by another job, so it doesn't do anything else

At this point, the only thing JVM 1 needs to validate to make sure that it is properly cleaned up is that the bind mount that it owns (whatever is under /cloudfuse/JVM-1) is gone. Without that bind mount it has no connection to the underlying FUSE mount created by the cache manager.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This was really helpful.

@danking danking merged commit da6ba69 into hail-is:main May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants