Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scanner: eliminate panics via unwrap() #24

Closed
wants to merge 1 commit into from
Closed

Conversation

davvid
Copy link
Collaborator

@davvid davvid commented Apr 11, 2024

Ensure that no code path leads to a panic by plugging up all of the error scenarios that are currently using .unwrap(). This improves robustness by handling errors instead of panicking.

Ensure that no code path leads to a panic by plugging up all of
the error scenarios that are currently using .unwrap().
This improves robustness by handling errors instead of panicking.
Copy link
Owner

@Ethiraric Ethiraric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I understand why one would want to remove panics, but I do not want errors to be silenced. I would rather have the code emit an error (whether it be Err or panics) than silently ignore them, which could cause hard to track errors.

We could have most functions return Results, but this would incur a runtime overhead.

Comment on lines +466 to +472
if self
.buffer
.push_back(self.rdr.next().unwrap_or('\0'))
.unwrap();
.is_err()
{
break;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this change is the right thing to do. The buffer errors when we try to push past its capacity (since it is fixed and cannot reallocate). This is something that, if we hit, is because count > self.buffer.len(), which is a logic error in the scanner that should be fixed not in lookahead but on its call site.

The unwrap here should never be hit. Breaking out of the loop might trigger a plethora of panics in the scanner, since the code assumes that, after calling lookahead(n), it is safe to access buffer[0..n-1].

In my opinion, the best course of action here would be to rather assert(count <= self.buffer.len()), ensuring we never call lookahead with a count that is too high. But that would go against removing the panics. I'd gladly have your opinion on this.

@@ -1470,7 +1474,7 @@ impl<T: Iterator<Item = char>> Scanner<T> {
fn decrease_flow_level(&mut self) {
if self.flow_level > 0 {
self.flow_level -= 1;
self.simple_keys.pop().unwrap();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this unwrap would silence a potential error in the scanner logic.

@@ -1777,7 +1781,7 @@ impl<T: Iterator<Item = char>> Scanner<T> {
// Our last character read is stored in `c`. It is either an EOF or a break. In any
// case, we need to push it back into `self.buffer` so it may be properly read
// after. We must not insert it in `string`.
self.buffer.push_back(c).unwrap();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this unwrap would silence a potential error in the scanner logic.

@@ -2365,7 +2374,9 @@ impl<T: Iterator<Item = char>> Scanner<T> {
);
self.roll_one_col_indent();

self.simple_keys.last_mut().unwrap().possible = false;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this unwrap would silence a potential error in the scanner logic.

@@ -2447,10 +2458,14 @@ impl<T: Iterator<Item = char>> Scanner<T> {
return;
}
while self.indent > col {
let indent = self.indents.pop().unwrap();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this unwrap would silence a potential error in the scanner logic.

@@ -2486,7 +2501,10 @@ impl<T: Iterator<Item = char>> Scanner<T> {
if self.simple_key_allowed {
let required = self.flow_level == 0
&& self.indent == (self.mark.col as isize)
&& self.indents.last().unwrap().needs_block_end;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this unwrap would silence a potential error in the scanner logic.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be self.indents.last().map_or(false, |ident| ident.needs_block_end).

@davvid
Copy link
Collaborator Author

davvid commented Apr 12, 2024

We could have most functions return Results, but this would incur a runtime overhead.

Returning Result<...>s seems like the way to go to me. Hopefully we can keep the overhead to an absolute minimum.

In order to minimize the overhead we could have some of these internal functions return Result<(), some_enum> with empty, data-free repr u8 or super minimal enums that can be stack allocated (i.e. we'd avoid error strings).

In theory this should allow the Result to fit within a single i32 (or smaller). If you think this is worth a try I can take a stab at making things work that way.

Thanks for the careful review. I figured I'd start with the scanner and then we can apply the same pattern to the parser as a follow-up. That way there's no chance that we can ever panic, which should improve confidence for folks that might want to use this crate in scenarios such as long-running servers where panicking is not an option.

Let me know what you think. I'll mark this as a draft for now.

@davvid davvid marked this pull request as draft April 12, 2024 02:22
@Ethiraric
Copy link
Owner

I think using an enum would be a good start, at least for measuring. I'm really afraid of what adding error management to lookahead might do to performance. It is called very frequently within the scanner.

Also, do you think you could direct your changes to saphyr instead? You should have received owner permissions to the organisation and its repositories.
If we are to make architectural changes, I'd rather we be totally free of what changes we make rather than to be bound by an api.

@davvid davvid closed this Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants