New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

scanner: eliminate panics via unwrap() #24

Closed

davvid wants to merge 1 commit into master from scanner-panic

Collaborator

davvid commented Apr 11, 2024

Ensure that no code path leads to a panic by plugging up all of the error scenarios that are currently using .unwrap(). This improves robustness by handling errors instead of panicking.


          scanner: eliminate panics via unwrap()

3b655d6

Ensure that no code path leads to a panic by plugging up all of
the error scenarios that are currently using .unwrap().
This improves robustness by handling errors instead of panicking.

davvid force-pushed the scanner-panic branch from 0d581de to 3b655d6 Compare

April 11, 2024 10:32

Ethiraric reviewed

View reviewed changes

Owner

Ethiraric left a comment

Overall, I understand why one would want to remove panics, but I do not want errors to be silenced. I would rather have the code emit an error (whether it be Err or panics) than silently ignore them, which could cause hard to track errors.

We could have most functions return Results, but this would incur a runtime overhead.

src/scanner.rs

Comment on lines +466 to +472

+                          if self
+                              .buffer
                               .push_back(self.rdr.next().unwrap_or('\0'))
-                              .unwrap();
+                              .is_err()
+                          {
+                              break;
+                          }

Owner

Ethiraric Apr 11, 2024

I don't think this change is the right thing to do. The buffer errors when we try to push past its capacity (since it is fixed and cannot reallocate). This is something that, if we hit, is because count > self.buffer.len(), which is a logic error in the scanner that should be fixed not in lookahead but on its call site.

The unwrap here should never be hit. Breaking out of the loop might trigger a plethora of panics in the scanner, since the code assumes that, after calling lookahead(n), it is safe to access buffer[0..n-1].

In my opinion, the best course of action here would be to rather assert(count <= self.buffer.len()), ensuring we never call lookahead with a count that is too high. But that would go against removing the panics. I'd gladly have your opinion on this.

src/scanner.rs

@@ @@ -1470,7 +1474,7 @@ impl<T: Iterator<Item = char>> Scanner<T> { @@
                   fn decrease_flow_level(&mut self) {
                       if self.flow_level > 0 {
                           self.flow_level -= 1;
-                          self.simple_keys.pop().unwrap();

Owner

Ethiraric Apr 11, 2024

Removing this unwrap would silence a potential error in the scanner logic.

src/scanner.rs

@@ @@ -1777,7 +1781,7 @@ impl<T: Iterator<Item = char>> Scanner<T> { @@
                           // Our last character read is stored in `c`. It is either an EOF or a break. In any
                           // case, we need to push it back into `self.buffer` so it may be properly read
                           // after. We must not insert it in `string`.
-                          self.buffer.push_back(c).unwrap();

Owner

Ethiraric Apr 11, 2024

Removing this unwrap would silence a potential error in the scanner logic.

src/scanner.rs

@@ @@ -2365,7 +2374,9 @@ impl<T: Iterator<Item = char>> Scanner<T> { @@
                           );
                           self.roll_one_col_indent();
-                          self.simple_keys.last_mut().unwrap().possible = false;

Owner

Ethiraric Apr 11, 2024

Removing this unwrap would silence a potential error in the scanner logic.

src/scanner.rs

@@ @@ -2447,10 +2458,14 @@ impl<T: Iterator<Item = char>> Scanner<T> { @@
                           return;
                       }
                       while self.indent > col {
-                          let indent = self.indents.pop().unwrap();

Owner

Ethiraric Apr 11, 2024

Removing this unwrap would silence a potential error in the scanner logic.

src/scanner.rs

@@ @@ -2486,7 +2501,10 @@ impl<T: Iterator<Item = char>> Scanner<T> { @@
                       if self.simple_key_allowed {
                           let required = self.flow_level == 0
                               && self.indent == (self.mark.col as isize)
-                              && self.indents.last().unwrap().needs_block_end;

Owner

Ethiraric Apr 11, 2024

Removing this unwrap would silence a potential error in the scanner logic.

Owner

Ethiraric Apr 11, 2024

This could also be self.indents.last().map_or(false, |ident| ident.needs_block_end).

Collaborator Author

davvid commented Apr 12, 2024

We could have most functions return Results, but this would incur a runtime overhead.

Returning Result<...>s seems like the way to go to me. Hopefully we can keep the overhead to an absolute minimum.

In order to minimize the overhead we could have some of these internal functions return Result<(), some_enum> with empty, data-free repr u8 or super minimal enums that can be stack allocated (i.e. we'd avoid error strings).

In theory this should allow the Result to fit within a single i32 (or smaller). If you think this is worth a try I can take a stab at making things work that way.

Thanks for the careful review. I figured I'd start with the scanner and then we can apply the same pattern to the parser as a follow-up. That way there's no chance that we can ever panic, which should improve confidence for folks that might want to use this crate in scenarios such as long-running servers where panicking is not an option.

Let me know what you think. I'll mark this as a draft for now.

davvid marked this pull request as draft

April 12, 2024 02:22

Owner

Ethiraric commented Apr 13, 2024

I think using an enum would be a good start, at least for measuring. I'm really afraid of what adding error management to lookahead might do to performance. It is called very frequently within the scanner.

Also, do you think you could direct your changes to saphyr instead? You should have received owner permissions to the organisation and its repositories.
If we are to make architectural changes, I'd rather we be totally free of what changes we make rather than to be bound by an api.

davvid closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet