-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
assertion failed: bpos.to_usize() >= mbc.pos.to_usize() + mbc.bytes #37274
Comments
IIRC @nox worked around another instance of the same assertion failure by removing some non-ASCII characters in a string constant (replacing them with |
I can confirm. |
It didn't help this time though, can't find any UTF-8 stuff to remove for that crate, even transitively. |
I tried to look into this:
|
Forget my last comment, apparently the problematic file is |
Work around an unfortunate ICE rust-lang/rust#37274 <!-- Reviewable:start --> --- This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/rust-url/233) <!-- Reviewable:end -->
@TimNN Maybe, but most files in rust-url are UTF-8 encoded with fancy chars, this definitely needs a rustc fix because we can't decently work around it. |
@nox: So it's still failing? I agree that this needs a proper fix in rustc, however the problem seems to be annoyingly hard to debug: the |
Okay I have reduced this problem down to a set of four crates, themselves reduced from the Servo source. (The four crates still are relying on a number of other crates in the crates.io ecosystem, most notably Serde.) |
After adding some extra debug instrumentation to the codemap, I wonder if this bug is being somehow tickled by the As an example of what I am referring to, see: https://github.com/servo/rust-url/blob/master/src/lib.rs#L26 which has the text
which has a (To be clear, this is still a compiler bug. The presence of characters like this should not be causing problems in Update: just as an FYI, this is the (roughly transcribed) line of
|
In addition to the various smart quotes, there are also some embedded ellipses, another non-ascii character. In emacs, you can interactively search for all such things by doing: |
Yes, I regularly type non-ASCII characters (mostly punctuation) in English so I’m not surprised to have them all over a place in a crate I maintain. For what it’s worth, from looking only very superficially at libsyntax code, the whole concept of |
I think the compiler will often include a character position within the file as part of its user feedback in error messages. I'm not sure how realistic it is to remove that abstraction for thinking about source code. (Also, |
Could we just remove the assertion? This is blocking Servo and the worst case would be a misplaced span. I think it's insane to ICE for that. |
@nox My opinion on bugs like this is that making them ICE ensures that the bugs actually get fixed. Its a broken windows type theory: removing the ICE and leaving the potential for a misplaced span is to me like taping up cardboard over a broken window; I'd rather fix the window. |
Continuing the analogy, if it takes a long time before your contractor can come fix the window, it's still better to cover it in the meantime. |
@Ms2ger I can't really argue with that. I'm going to keep trying to identify the issue (I think I'm close), but I wouldn't veto a PR that temporarily comments out the line in question (and has an other comment linking to this bug). |
This issue looks very confusing - bad debuginfo caused by misplaced spans can be quite a bit of trouble. |
@pnkfelix Btw, that DEBUG message you pasted does not correspond to the first
|
As I mentioned on IRC, a code point count is an approximation (maybe good enough in many cases) of the width taken by some text in a monospace font, but it is not the same. Unicode contains both "combining" code point that don’t take horizontal space of their own, and "double-width" characters (many from Asian languages) that take two columns. https://github.com/unicode-rs/unicode-width implements an algorithm to deal with this. It was once upon a time part of Regardless (even if the approximation is kept instead of the Unicode algorithm), I think in the long term it would be beneficial to refactor |
Do not intern filemap to entry w/ mismatched length. Do not intern filemap to entry w/ mismatched length. Fix #37274 (I think). Beta-nominated; note that only the second commit needs to be cherry picked to beta branch. (The first just adds some debug instrumentation that I wish had been present.)
Hit on running
./mach test-unit
on https://github.com/Ms2ger/servo/tree/rustc-bug-codemap on travis.The text was updated successfully, but these errors were encountered: