-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[batch] Don't rmtree if any errors occur while unmounting #12985
Conversation
It would be great in a separate PR to check for sure there aren't any zombie fuse mounts before removing files. |
batch/batch/worker/worker.py
Outdated
except asyncio.CancelledError: | ||
raise | ||
except Exception: | ||
log.exception(f'error while unmounting cloudfuse for {self.jvm_name} for job {self.id}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's include the same information we have above:
log.exception(f'error while unmounting cloudfuse for {self.jvm_name} for job {self.id}') | |
log.exception( | |
f'while unmounting fuse blob storage {bucket} from {mount_path} for {self.jvm_name} for job {self.id}' | |
) |
batch/batch/worker/worker.py
Outdated
except IncompleteJVMCleanupError: | ||
assert isinstance(job, JVMJob) | ||
await self.recreate_jvm(job.jvm) | ||
log.exception(f'while running {job}, incomplete cleanup from {job.jvm}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This results in two exceptional logs for a single exception. Let's either not log here or not log near the raise.
57703d3
to
86f266a
Compare
batch/batch/worker/worker.py
Outdated
raise IncompleteJVMCleanupError | ||
raise IncompleteJVMCleanupError( | ||
f'while unmounting fuse blob storage {bucket} from {mount_path} for {self.jvm_name} for job {self.id}' | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name the exception and add a from e
so that we get the fulls tack trace in the log.exception later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x
batch/batch/worker/worker.py
Outdated
@@ -2312,6 +2320,11 @@ class JVMCreationError(Exception): | |||
pass | |||
|
|||
|
|||
class IncompleteJVMCleanupError(Exception): | |||
def __init__(self, msg): | |||
self.msg = msg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't necessary:
In [1]: class IncompleteJVMCleanupError(Exception):
...: pass
...:
In [2]: raise IncompleteJVMCleanupError('foo')
---------------------------------------------------------------------------
IncompleteJVMCleanupError Traceback (most recent call last)
<ipython-input-2-a8a0c94b0ce4> in <cell line: 1>()
----> 1 raise IncompleteJVMCleanupError('foo')
IncompleteJVMCleanupError: foo
These changes look great |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
assert isinstance(job, JVMJob) | ||
await self.recreate_jvm(job.jvm) | ||
log.exception(e.msg) | ||
log.exception(f'while running {job}, ignoring') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@danking Can you double check the actual exception message will appear in the logs here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, log.exception
does this.
I think this change is better than #12975