-
-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix recursion error during file attachment parsing of deep nested paths #173
Conversation
…ative cancellation
Using example for reproducing issue #172
}, | ||
timeout=self.files_timeout, | ||
) | ||
with time_limit(self.files_timeout): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a test that asserts this time outs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a test parsing symlink files for timeout, but that doesn't determine if the timeout occured due to the time.monotic()
comparison timeout or by the signal. The signal timeout would be if a very large single file was parsed, or pathlib ends up in some loop.
But also added 2 new tests for time_limit
successfully interrupting time.sleep
and a long-running built-in C function (math.factorial) in 1af5bf0
signal.alarm(timeout) | ||
|
||
try: | ||
yield | ||
finally: | ||
signal.alarm(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clever. I don't know enough about signals to say whether this is an appropriate use, but I think it's low risk to try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to work pretty accurately for interrupting time.sleep
and built-in C functions in these tests 1af5bf0, so it seems okay for our usecase, and avoids the potential broken pipes issue with multiprocessing.
start_time = time.monotonic() | ||
added = 0 | ||
files = glob.iglob(pattern, root_dir=str(self.output), recursive=True, include_hidden=False) | ||
for file in (Path(self.output, f) for f in files): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elegant use of a generator expression here!
|
||
def files_list( | ||
self, | ||
limit: int, | ||
pattern: str, | ||
exclude_files: dict[Path, float] | None = None, | ||
preload_dict: bool = False, | ||
timeout: float | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on using float("inf")
(aka math.inf
) instead of None
to specify no timeout? Maybe a bit odd at first glance, but I think it makes sense, and it does avoid dealing with None
checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the None is still useful on performance grounds since we currently bypass the time.monotonic
call and comparison when timeout
is None.
Co-authored-by: Mark <1515135+MarkKoz@users.noreply.github.com>
…h `timed.time_limit`
@@ -289,7 +289,7 @@ def python3( | |||
args, None, "TimeoutError: Exceeded time limit while parsing attachments" | |||
) | |||
except Exception as e: | |||
log.error(f"Unexpected {type(e).__name__} while parse attachments: {e}") | |||
log.exception(f"Unexpected {type(e).__name__} while parse attachments", exc_info=e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exc_info
is redundant when log.exception
is used within an except
block.
Fixes: #172
RecursionError
from glob / pathlib. On such event we return an exit codeNone
, and stdout:Additionally reduced file parsing time limit from 8 to 5 seconds.
Added inter-loop time limit calculations within
MemFS.files
andMemFS.files_list
loops withtime.monotonic
for cooperative time limit cancellation.Added a unit test with the reproducer from BOT-3EN: snekbox hits a recusion exceded exception in pathlib when parsing lots of nested directories and 500s #172
Due to some weird interactions between
multiprocessing.Pool
and the mp runner for falcon resulting in a broken pipe and deadlock (see https://github.com/python-discord/snekbox/actions/runs/4395009986/jobs/7696475837), switched from mp time limit runner to aSIGALRM
based one.