-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[batch] Check /proc/mounts for straggler cloudfuse mounts #12986
Conversation
batch/batch/worker/worker.py
Outdated
@@ -419,6 +419,9 @@ async def _fuse_mount(self, destination: str, *, credentials_path: str, tmp_path | |||
async def _fuse_unmount(self, path: str): | |||
assert CLOUD_WORKER_API | |||
await CLOUD_WORKER_API.unmount_cloudfuse(path) | |||
proc_mounts_stdout, _ = await check_shell_output('cat /proc/mounts') | |||
if path in str(proc_mounts_stdout): | |||
raise IncompleteCloudFuseCleanup(f'incomplete cloudfuse unmounting: {proc_mounts_stdout}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see now. This looks good, but can we just use python open
instead of cat
to read in the contents of /proc/mounts
?
with open('/proc/mounts', 'r') as f: | ||
output = f.read() | ||
if self.cloudfuse_base_path() in output: | ||
raise IncompleteCloudFuseCleanup(f'incomplete cloudfuse unmounting: {output}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fairly sure exceptions raised in DockerJob.cleanup
are not logged anywhere. Using the raise to skip the xfs_quota
and rmtree
seems fine but we need to log the exception here.
batch/batch/worker/worker.py
Outdated
with open('/proc/mounts', 'r') as f: | ||
output = f.read() | ||
if path in output: | ||
raise IncompleteCloudFuseCleanup(f'incomplete cloudfuse unmounting: {output}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would read this file for each mount rather than once after all attempts to unmount. Let's make JVMJob symmetric to DockerJob and check the mounts in cleanup
in both Jobs. I think we should also make sure we attempt to upload the log before we check /proc/mounts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah true, there's no need to put this logic in the cloudfuse cache because we never rmtree things in here. The attack vector is any bind mount left over in a directory that we will subsequently rmtree, so the way to handle the JVMJob case is to check /proc/mounts at the end of a JVM job for any leftover mounts in the scratch directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we checking for paths in the scratch directory? Should I be checking for the scratch directory in Docker jobs as well?
batch/batch/worker/worker.py
Outdated
@@ -2234,6 +2230,11 @@ async def cleanup(self): | |||
await self.jvm.cloudfuse_mount_manager.unmount(mount_path, user=self.user, bucket=bucket) | |||
config['mounted'] = False | |||
|
|||
with open('/proc/mounts', 'r') as f: | |||
output = f.read() | |||
if self.cloudfuse_base_path() in output: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my question is won't this fuse path remain alive if there's other jobs reading the data? That's why I had the check after we actually call fusermount
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a fuse path, it's a bind mount path which is the thing that the JVMJob actually owns. Here's an example:
- JVM 1 wants bucket
foo
at/cloudfuse/JVM-1/foo
- FUSE cache manager makes two mounts:
- FUSE mount at
/cloudfuse/read_only/abc123/foo
- Bind mounts
/cloudfuse/read_only/abc123/foo
to/cloudfuse/JVM-1/foo
- JVM 2 wants bucket
foo
at/cloudfuse/JVM-2/foo
- FUSE cache manager reuses the underlying FUSE mount and bind mounts
/cloudfuse/read_only/abc123/foo
to/cloudfuse/JVM-2/foo
. - JVM 1 finishes and wants to clean up, it calls
cloudfuse_mount_manager.unmount
- FUSE cache manager unmounts the bind mount at
/cloudfuse/JVM-1/foo
. The underyling FUSE mount is still being used by another job, so it doesn't do anything else
At this point, the only thing JVM 1 needs to validate to make sure that it is properly cleaned up is that the bind mount that it owns (whatever is under /cloudfuse/JVM-1
) is gone. Without that bind mount it has no connection to the underlying FUSE mount created by the cache manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This was really helpful.
No description provided.