Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assertion failed: bpos.to_usize() >= mbc.pos.to_usize() + mbc.bytes #37274

Closed
Ms2ger opened this issue Oct 19, 2016 · 20 comments
Closed

assertion failed: bpos.to_usize() >= mbc.pos.to_usize() + mbc.bytes #37274

Ms2ger opened this issue Oct 19, 2016 · 20 comments
Assignees
Labels
I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Ms2ger
Copy link
Contributor

Ms2ger commented Oct 19, 2016

Hit on running ./mach test-unit on https://github.com/Ms2ger/servo/tree/rustc-bug-codemap on travis.

12.18s$ ./mach test-unit

   Compiling util_tests v0.0.1 (file:///home/travis/build/servo/servo/tests/unit/util)

   Compiling profile_tests v0.0.1 (file:///home/travis/build/servo/servo/tests/unit/profile)

   Compiling net_traits_tests v0.0.1 (file:///home/travis/build/servo/servo/tests/unit/net_traits)

   Compiling net_tests v0.0.1 (file:///home/travis/build/servo/servo/tests/unit/net)

   Compiling gfx_tests v0.0.1 (file:///home/travis/build/servo/servo/tests/unit/gfx)

   Compiling script_tests v0.0.1 (file:///home/travis/build/servo/servo/tests/unit/script)

   Compiling layout_tests v0.0.1 (file:///home/travis/build/servo/servo/tests/unit/layout)

error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports

note: run with `RUST_BACKTRACE=1` for a backtrace

thread 'rustc' panicked at 'assertion failed: bpos.to_usize() >= mbc.pos.to_usize() + mbc.bytes', ../src/libsyntax/codemap.rs:707

stack backtrace:

   1:     0x7f623f0f79df - std::sys::backtrace::tracing::imp::write::h22f199c1dbb72ba2

   2:     0x7f623f106f0d - std::panicking::default_hook::{{closure}}::h9a389c462b6a22dd

   3:     0x7f623f104386 - std::panicking::default_hook::h852b4223c1c00c59

   4:     0x7f623f104a68 - std::panicking::rust_panic_with_hook::hcd9d05f53fa0dafc

   5:     0x7f623b79a92f - std::panicking::begin_panic::hc03e2830c2c89a5f

   6:     0x7f623b7ff9c0 - syntax::codemap::CodeMap::bytepos_to_file_charpos::h3e1ecefe8c872433

   7:     0x7f623b7fc85f - syntax::codemap::CodeMap::lookup_char_pos::hfba960093e9845af

   8:     0x7f623dd0e8d7 - rustc_trans::mir::MirContext::scope_metadata_for_loc::hf4ec7e4a54af690a

   9:     0x7f623dd0e71c - rustc_trans::mir::MirContext::debug_loc::h081d6b2d00268062

  10:     0x7f623dd12a44 - rustc_trans::mir::block::<impl rustc_trans::mir::MirContext<'bcx, 'tcx>>::trans_block::he67d3259f79e4177

  11:     0x7f623dd10b58 - rustc_trans::mir::trans_mir::h2fb44ecb31cfdffa

  12:     0x7f623dcb2cdc - rustc_trans::base::trans_closure::h941de14309416d66

  13:     0x7f623dd30a34 - rustc_trans::trans_item::TransItem::define::ha4a18b94a3d46bf3

  14:     0x7f623dcb6254 - rustc_trans::base::trans_crate::h9b06de31ed8799d1

  15:     0x7f623f48cd4d - rustc_driver::driver::phase_4_translate_to_llvm::hc3883ea2c4750179

  16:     0x7f623f4c7cf7 - rustc_driver::driver::compile_input::{{closure}}::h9162a2fa292aeb3f

  17:     0x7f623f4beef3 - rustc_driver::driver::phase_3_run_analysis_passes::{{closure}}::h1928c4704cfe9c61

  18:     0x7f623f48a6ed - rustc_driver::driver::phase_3_run_analysis_passes::he578df6b8805151c

  19:     0x7f623f476f69 - rustc_driver::driver::compile_input::h5b63ccd49eeeb98b

  20:     0x7f623f4a02ba - rustc_driver::run_compiler::h98c7274e7cb1d11d

  21:     0x7f623f3d9f0b - std::panicking::try::do_call::h99ed0da044e497c3

  22:     0x7f623f10ee06 - __rust_maybe_catch_panic

  23:     0x7f623f3f8461 - <F as alloc::boxed::FnBox<A>>::call_box::hbdd5a14cd8e33b97

  24:     0x7f623f102de0 - std::sys::thread::Thread::new::thread_start::h50b05608a499d2b2

  25:     0x7f62373da183 - start_thread

  26:     0x7f623edbe37c - clone

  27:                0x0 - <unknown>

Build failed, waiting for other jobs to finish...

error: Could not compile `net_tests`.

To learn more, run the command again with --verbose.

The command "./mach test-unit" exited with 101.
@SimonSapin
Copy link
Contributor

Looks like the failing assertion is the same as in #24687, which was fixed in #24932. CC @pnkfelix who made that fix.

@SimonSapin
Copy link
Contributor

IIRC @nox worked around another instance of the same assertion failure by removing some non-ASCII characters in a string constant (replacing them with \u{…}) in a dependency of the crate failing to compile.

@nox
Copy link
Contributor

nox commented Oct 19, 2016

I can confirm.

@nox
Copy link
Contributor

nox commented Oct 19, 2016

It didn't help this time though, can't find any UTF-8 stuff to remove for that crate, even transitively.

@TimNN
Copy link
Contributor

TimNN commented Oct 20, 2016

I tried to look into this:

  • I can reproduce this on linux
  • I looked a bit at the syntax=debug log, and it may be possible that the problematic chars originate from a proc-macro generated input Edit: apparently not

@TimNN
Copy link
Contributor

TimNN commented Oct 20, 2016

Forget my last comment, apparently the problematic file is lib.rs from the url crate, which contains a non-unicode character in a doc comment: https://github.com/servo/rust-url/blob/master/src/lib.rs#L765

nox added a commit to nox/rust-url that referenced this issue Oct 20, 2016
bors-servo pushed a commit to servo/rust-url that referenced this issue Oct 20, 2016
Work around an unfortunate ICE

rust-lang/rust#37274

<!-- Reviewable:start -->
---
This change is [<img src="https://reviewable.io/review_button.svg" height="34" align="absmiddle" alt="Reviewable"/>](https://reviewable.io/reviews/servo/rust-url/233)
<!-- Reviewable:end -->
@nox
Copy link
Contributor

nox commented Oct 21, 2016

@TimNN Maybe, but most files in rust-url are UTF-8 encoded with fancy chars, this definitely needs a rustc fix because we can't decently work around it.

@TimNN
Copy link
Contributor

TimNN commented Oct 21, 2016

@nox: So it's still failing?

I agree that this needs a proper fix in rustc, however the problem seems to be annoyingly hard to debug: the syntax=debug log is practically useless (at least to me) and I cannot easily reproduce this problem: just having an extern crate with fancy chars in doc comments & using it does not trigger the problem.

@pnkfelix pnkfelix self-assigned this Oct 21, 2016
@pnkfelix pnkfelix added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. I-nominated labels Oct 21, 2016
@pnkfelix
Copy link
Member

Okay I have reduced this problem down to a set of four crates, themselves reduced from the Servo source.

(The four crates still are relying on a number of other crates in the crates.io ecosystem, most notably Serde.)

https://github.com/pnkfelix/rust-issues/tree/issue-37274

@pnkfelix
Copy link
Member

pnkfelix commented Oct 26, 2016

After adding some extra debug instrumentation to the codemap, I wonder if this bug is being somehow tickled by the url crate's use of smart quotes...

As an example of what I am referring to, see:

https://github.com/servo/rust-url/blob/master/src/lib.rs#L26

which has the text

[Cargo’s *features* mechanism](http://doc.crates.io/manifest.html#the-[features]-section):

which has a (codepoint 0x2019) where I might have instead expected a ' (codepoint 0x27).

(To be clear, this is still a compiler bug. The presence of characters like this should not be causing problems in rustc...)


Update: just as an FYI, this is the (roughly transcribed) line of DEBUG output that led me to wonder about the url crate here:

DEBUG:syntax::codemap: 3-byte char at BytePos(11348), bpos: BytePos(11350), filemap: \
  ("name", "/home/fklock/.multirust/toolchains/nightly/cargo/registry/src/github.com-1ecc6299db9ec823/url-1.2.2/src/lib.rs", \
   "start_pos", BytePos(9968), \
   "end_pos", BytePos(63149))

@pnkfelix
Copy link
Member

pnkfelix commented Oct 26, 2016

In addition to the various smart quotes, there are also some embedded ellipses, another non-ascii character.

In emacs, you can interactively search for all such things by doing: M-x isearch-forward-regexp [^[:ascii:]], which will highlight them all (and you can then just keep hitting C-s to jump to the next one in the buffer).

@SimonSapin
Copy link
Contributor

Yes, I regularly type non-ASCII characters (mostly punctuation) in English so I’m not surprised to have them all over a place in a crate I maintain.

For what it’s worth, from looking only very superficially at libsyntax code, the whole concept of CharPos seems dubious. Would it be possible or make sense to remove it and use BytePos everywhere?

@pnkfelix
Copy link
Member

For what it’s worth, from looking only very superficially at libsyntax code, the whole concept of CharPos seems dubious. Would it be possible or make sense to remove it and use BytePos everywhere?

I think the compiler will often include a character position within the file as part of its user feedback in error messages. I'm not sure how realistic it is to remove that abstraction for thinking about source code.

(Also, ack char_pos src/lib* yields around 50 or so matches...)

@nox
Copy link
Contributor

nox commented Oct 27, 2016

Could we just remove the assertion? This is blocking Servo and the worst case would be a misplaced span. I think it's insane to ICE for that.

@pnkfelix
Copy link
Member

@nox My opinion on bugs like this is that making them ICE ensures that the bugs actually get fixed. Its a broken windows type theory: removing the ICE and leaving the potential for a misplaced span is to me like taping up cardboard over a broken window; I'd rather fix the window.

@Ms2ger
Copy link
Contributor Author

Ms2ger commented Oct 27, 2016

Continuing the analogy, if it takes a long time before your contractor can come fix the window, it's still better to cover it in the meantime.

@pnkfelix
Copy link
Member

@Ms2ger I can't really argue with that. I'm going to keep trying to identify the issue (I think I'm close), but I wouldn't veto a PR that temporarily comments out the line in question (and has an other comment linking to this bug).

@pnkfelix pnkfelix added the P-high High priority label Oct 27, 2016
@arielb1 arielb1 added the I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ label Oct 27, 2016
@pnkfelix pnkfelix removed I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ I-nominated labels Oct 27, 2016
@arielb1
Copy link
Contributor

arielb1 commented Oct 27, 2016

This issue looks very confusing - bad debuginfo caused by misplaced spans can be quite a bit of trouble.

@nox
Copy link
Contributor

nox commented Oct 28, 2016

@pnkfelix Btw, that DEBUG message you pasted does not correspond to the first in that file, but to the second in:

Let’s parse a valid URL and look at its components.

pnkfelix added a commit to pnkfelix/rust that referenced this issue Oct 28, 2016
@SimonSapin
Copy link
Contributor

I think the compiler will often include a character position within the file as part of its user feedback in error messages.

As I mentioned on IRC, a code point count is an approximation (maybe good enough in many cases) of the width taken by some text in a monospace font, but it is not the same. Unicode contains both "combining" code point that don’t take horizontal space of their own, and "double-width" characters (many from Asian languages) that take two columns.

https://github.com/unicode-rs/unicode-width implements an algorithm to deal with this. It was once upon a time part of std and was removed since, but the compiler could use it without std exposing it. (I don’t know if the compiler ever used it.)


Regardless (even if the approximation is kept instead of the Unicode algorithm), I think in the long term it would be beneficial to refactor libsyntax to remove CharPos and only compute text widths when error messages are composed. For sure this is more work that just fixing this particular ICE (and there’s probably difficulties I haven’t foreseen), but I believe it would help avoid having a similar bug come up again.

bors added a commit that referenced this issue Oct 29, 2016
Do not intern filemap to entry w/ mismatched length.

Do not intern filemap to entry w/ mismatched length.

Fix #37274 (I think).

Beta-nominated; note that only the second commit needs to be cherry picked to beta branch. (The first just adds some debug instrumentation that I wish had been present.)
@pnkfelix pnkfelix added the I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ label Oct 31, 2016
brson pushed a commit to brson/rust that referenced this issue Nov 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ P-high High priority T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants