-
Notifications
You must be signed in to change notification settings - Fork 691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regenerate Lexer.hs with latest Alex (fix #8892) #8896
Conversation
779f6dc
to
b5717e6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Sounds like something worth backporting to 3.10, @Mikolaj ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely, let's backport.
I guess some old test results require a bit of tweaking, too.
@mergify backport 3.10 |
✅ Backports have been created
|
b5717e6
to
8a34cb2
Compare
I've updated to require Alex 3.2.7.3 which doesn't generate files that require |
@hsyl20 when you have time, could you look into the test failure? I guess, just updating the reference should be enough? (I haven't looked too close though) Updating (via |
I've updated the test. However why did we have to update this? It could be an Alex bug: previously, two unbreakable whitespaces were correctly reported at column 3; now they are reported at column 5, which makes no sense except if column is expressed in bytes instead of characters. |
The new failure seems to confirm this. |
We use a Latin1 generated parser with Alex, but we also parses Unicode BOM, unbreakable spaces, etc. In recent Alex, the reported column isn't expressed in Unicode chars anymore but in bytes/ASCII chars (probably due to haskell/alex@ae525e3 but I haven't checked), which broke our tests (see haskell#8896). To work around this we report indentation warnings at token start position, instead of token end position (i.e. always 1). Otherwise position makes no sense anymore for the user.
New commit with message:
I've tried to switch to a utf8 lexer instead of latin1 but it makes many more tests fail. I don't even know if |
This all sounds a bit worrying. @andreasabel, as the author of the corresponding For the sake of discussion, maybe we should preserve links to failing workflows. I know they will die at some point, but at least it gives us some time to contemplate on different approaches. |
new_s = if GTE(offset,0#) && EQ(check,ord_c) | ||
new_s = if GTE(offset,ILIT(0)) | ||
&& let check = alexIndexInt16OffAddr alex_check offset | ||
in EQ(check,ord_c) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comes from this Alex commit: haskell/alex@e96c347
(#224)
I do not see how this should have introduced the problems we are seeing.
GTE
ILIT
and EQ
are macros, yes, but they do not have side effects, so moving the check
binding into a local let
should not be a semantic change except making things lazier.
I currently do not have time to dig deeper, but I think the latest |
Yes I also think it's just the result of this change: haskell/alex@ae525e3#diff-007f894e1221eb8cafde8fdf0ee317bd32859704c27652c29cbb5417a9d5c37dR179-R185 Which has been introduced in Alex 3.2.7 |
We use a Latin1 generated parser with Alex, but we also parses Unicode BOM, unbreakable spaces, etc. In recent Alex, the reported column isn't expressed in Unicode chars anymore but in bytes/ASCII chars (probably due to haskell/alex@ae525e3 but I haven't checked), which broke our tests (see haskell#8896). To work around this we report indentation warnings at token start position, instead of token end position (i.e. always 1). Otherwise position makes no sense anymore for the user.
cef2d51
to
4574b38
Compare
Ok, this was 3 years ago, I don't remember much of it. |
Technically, this has two approvals and can be merged: the author is expected to add either |
I've opened an upstream ticket: haskell/alex#227 |
Regenerate Lexer.hs with Alex 3.2.7.2 to fix issue #haskell#8892 (out-of-bound access due to haskell/alex#223).
We use a Latin1 generated parser with Alex, but we also parses Unicode BOM, unbreakable spaces, etc. In recent Alex, the reported column isn't expressed in Unicode chars anymore but in bytes/ASCII chars (probably due to haskell/alex@ae525e3 but I haven't checked), which broke our tests (see haskell#8896). To work around this we report indentation warnings at token start position, instead of token end position (i.e. always 1). Otherwise position makes no sense anymore for the user.
4574b38
to
5f72880
Compare
We use a Latin1 generated parser with Alex, but we also parses Unicode BOM, unbreakable spaces, etc. In recent Alex, the reported column isn't expressed in Unicode chars anymore but in bytes/ASCII chars (probably due to haskell/alex@ae525e3 but I haven't checked), which broke our tests (see #8896). To work around this we report indentation warnings at token start position, instead of token end position (i.e. always 1). Otherwise position makes no sense anymore for the user. (cherry picked from commit 5f72880)
The documentation confirms they are UTF-8.
|
So in theory we could use Alex unicode support. But since then I've read in https://amelia.how/posts/parsing-layout.html that:
|
- this matches GHC CI - Fixes a bug in the JS backend see - https://gitlab.haskell.org/ghc/ghc/-/issues/22356 - haskell/cabal#8896 - https://gitlab.haskell.org/ghc/ghc/-/merge_requests/11444#note_552521https://gitlab.haskell.org/ghc/ghc/-/merge_requests/11444#note_552521
- this matches GHC CI - Fixes a bug in the JS backend see - https://gitlab.haskell.org/ghc/ghc/-/issues/22356 - haskell/cabal#8896 - https://gitlab.haskell.org/ghc/ghc/-/merge_requests/11444#note_552521https://gitlab.haskell.org/ghc/ghc/-/merge_requests/11444#note_552521
Regenerate Lexer.hs with Alex 3.2.7.2 to fix issue ##8892 (out-of-bound access due to haskell/alex#223).
Please include the following checklist in your PR:
Please also shortly describe how you tested your change. Bonus points for added tests!