Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use 1-based indexing for columns in debuginfo #65676

Closed

Conversation

ecstatic-morse
Copy link
Contributor

@ecstatic-morse ecstatic-morse commented Oct 21, 2019

edited from original

Resolves #65437.

This fixes an off-by-one error when writing column numbers in DWARF, which uses 1-based indexing.

For the newly added test, the resulting debuginfo is:

!129 = distinct !DISubprogram(name: "main", linkageName: "_ZN20debug_column_1_based4main17h2f9f746428050388E", scope: !130, file: !18, line: 16, type: !12, scopeLine: 16, flags: DIFlagPrototyped, spFlags: DISPFlagLocalToUnit | DISPFlagDefinition | DISPFlagOptimized | DISPFlagMainSubprogram, unit: !17, templateParams: !4, retainedNodes: !131)
!130 = !DINamespace(name: "debug_column_1_based", scope: null)
!131 = !{!132}
!132 = !DILocalVariable(name: "x", scope: !133, file: !18, line: 11, type: !46, align: 4)
!133 = distinct !DILexicalBlock(scope: !129, file: !18, line: 11, column: 5)
!134 = !DILocation(line: 11, column: 9, scope: !133)
!135 = !DILocation(line: 11, column: 9, scope: !129)
!136 = !DILocation(line: 11, column: 13, scope: !129)
!137 = !DILocation(line: 12, column: 1, scope: !129)

You can see that the locations for x and 32 are correct (columns 9 and 13 respectively), the DILexicalBlock for main begins at column 5 (the first let statement) and the implicit return statement has column 1 (the closing }).

r? @michaelwoerister

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 21, 2019
@RReverser
Copy link
Contributor

Thanks for working on this!

and that changing column numbers for implicit returns should be considered in a separately. I'm not sure if @RReverser agrees.

I don't really agree because, as the issue name and description says, it was about off-by-one column in DWARF and it shown both cases where it happens.

This fixes one of the cases, but breaks the other one instead (ending column was correct before, but instead becomes off-by-one now). Introducing regression while fixing a bug is far from ideal, which is why I didn't make a PR yet and wrote on the issue that this needs further investigation.

However, the DILexicalBlock for main begins at column 4 and the return statement (I think?) has column 2, both of which are whitespace.

Note that this is not fully correct:

  1. Column 4 for the beginning points not at whitespace but at letter m, as it should (as it's 1-based column, so it's 4th character in fn m).
  2. Column 2 for the ending points not at whitespace but past the end of the file, which might introduce more serious issues for DWARF consumers.

I'd suggest holding off this PR for now and fixing the column for closing braces / return statement either as part of it or in a separate one before merging this one to avoid introducing new potential issues.

@ecstatic-morse
Copy link
Contributor Author

ecstatic-morse commented Oct 21, 2019

  1. Column 4 for the beginning points not at whitespace but at letter m, as it should (as it's 1-based column, so it's 4th character in fn m).

This would be true if it was line 14, but the scope begins on line 15. Otherwise I agree with the facts above.

@RReverser
Copy link
Contributor

This would be true if it was line 14, but the scope begins on line 15.

Oh, you're right. Looks like original actually did the same - can you try your example without the patch? If it's still the same, then there is probably another place where col + 1 should happen, specifically for the lexical blocks.

@est31
Copy link
Member

est31 commented Oct 22, 2019

However, the DILexicalBlock for main begins at column 4

Huh, that's a good point. I think there might be another place where we need to add +1 as well.

The code this PR changes only seems to be involved in DILocation creation and thus only they are being fixed. I suppose the place creating DILexicalBlock would be this place.

@RReverser
Copy link
Contributor

I think you'd need to change a column here for the scope to work.

loc.col.to_usize() as c_uint))

@RReverser
Copy link
Contributor

@est31 Oh, race condition, haha :)

@est31
Copy link
Member

est31 commented Oct 22, 2019

As I understand it, it is @est31's view that this would completely resolve the original issue and that changing column numbers for implicit returns should be considered in a separately.

Indeed that's my view.
The comment above the one you linked contains a snippet of dumped MIR info together with line and column numbers. The column number for the return statement is similarly off by one. So that problem is definitely separate from the missing +1 problem. Yes, this PR means that for ret statements, the two issues don't cancel out each other any more. But it feels to me that there are less ret statements than there are non-ret statements so overall the PR is an improvement. I think @RReverser agrees with this PR as well, so I don't think there was any disagreement here. shrug.


// CHECK-LABEL: DISubprogram(name: "main"
// CHECK: {{.*}}DILocalVariable(name: "x",{{.*}}line: 15{{.*}}
// CHECK-NEXT: {{.*}}DILexicalBlock{{.*}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe turn this line into a test for the DILexicalBlock change?

Copy link
Contributor Author

@ecstatic-morse ecstatic-morse Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not that fast okay 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was caught between commenting too early and risking that you already want to do it or commenting too late at which point you might already be doing something else. :).

@RReverser
Copy link
Contributor

But it feels to me that there are less ret statements than there are non-ret statements so overall the PR is an improvement.

My counter-argument is that it's not so much the count that matters, as severity. Previously columns were off, but at least both the [accidentally] correct ones (for return statements) and the incorrect ones (for everything else) stayed within bounds of the line, whereas now columns can point past the end of the line or even the file, which can create more serious issues for editors or other debugging tooling that doesn't expect this sort of overflow issues.

@ecstatic-morse
Copy link
Contributor Author

I'd be a bit surprised if DWARF consumers borked on one-past-the-end column numbers, but I've seen (and written) dumber things. If we were to commit to never emitting such a location in DWARF, where should the span for the return statement inside the closure point to in the following? The 2?

fn func() -> fn() -> i32 {
    || 42
}

@RReverser
Copy link
Contributor

RReverser commented Oct 22, 2019

Is there no way we can detect and change column at the LLVM debug location level, rather than MIR as you (@est31) described in the issue? In DWARF there is end_sequence marker for end of function, which is easy to detect, so there should be some way to detect and adjust column for these locations programmatically before emitting?

@RReverser
Copy link
Contributor

If we were to commit to never emitting such a location in DWARF, where should the span for the return statement inside the closure point to in the following? The 2?

That's what it does right now (before the patch), which is admittedly a bit weird, but seems reasonable enough.

@est31
Copy link
Member

est31 commented Oct 22, 2019

where should the span for the return statement inside the closure point to in the following? The 2?

The ret statement has nothing corresponding to it in the surface syntax of the language. So demanding that you find something corresponding is hard I guess :). Both the 2 as well as pointing towards the end of the line are bad choices here. I don't have an opinion onto which is better.

@ecstatic-morse
Copy link
Contributor Author

ecstatic-morse commented Oct 22, 2019

After a sleep, I've decided to simply map zero-width spans to the preceding column (instead of the succeeding one). As a result, we should never emit columns pointing past the end of a line.

My purpose in posting the example above was to illustrate a case where it would be desirable for a debuginfo column to point past the end of a line if that were supported by most DWARF consumers. I don't know if this is the case in practice, and no ruling is made in the DWARF spec AFAIK. @RReverser, perhaps you could share what tools you're using that make use of this information? It might be nice to test against them in the future, since LLVM's FileCheck tests aren't really designed for DWARF.

It would be ideal if spans pointed to the closing brace of a code block. This would give better errors in rustc as well, such as in this error message. As @est31 has explained, this is not trivial, and should go in another issue.

I think this should resolve everyone's concerns.

@ecstatic-morse
Copy link
Contributor Author

I've edited the PR summary to reflect the new updates.

@RReverser
Copy link
Contributor

After a sleep, I've decided to simply map zero-width spans to the preceding column (instead of the succeeding one). As a result, we should never emit columns pointing past the end of a line.

Sounds reasonable to me!

It would be ideal if spans pointed to the closing brace of a code block. This would give better errors in rustc as well, such as in this error message . As @est31 has explained, this is not trivial, and should go in another issue.

I actually wonder if it could be as simple as what you're doing above, but in a proper place - that is, just modifying this span:

let fn_end = span.shrink_to_hi();

that @est31 referenced earlier, to point to 1 character before? Then it should fix both DWARF and cases like error message you referenced, right?

@ecstatic-morse
Copy link
Contributor Author

ecstatic-morse commented Oct 22, 2019

I actually wonder if it could be as simple as what you're doing above, but in a proper place - that is, just modifying this span:

I tried that yesterday. The problem is that not all hir::ExprKind::Blocks end in }. If you git blame on the line above that one, you'll see this being discussed. Naively subtracting one causes a lot of ICEs due to underflow. Apparently some blocks end at BytePos(0) in their respective file (I think these come from proc macros?).

@RReverser
Copy link
Contributor

RReverser commented Oct 22, 2019

The BytePos(0) ones sound weird and should be excluded, but for others I think we still want to point to the last char, even if it's not a brace (like in example you provided above with || 42). At least this would keep the existing behaviour until we implement something more sophisticated in future.

@ecstatic-morse
Copy link
Contributor Author

@michaelwoerister Any chance of getting this reviewed? I'm confident that this is the best solution that doesn't require changing the attribution of spans while building the MIR (see #37310). It's pretty minimal, and it avoids creating columns that point past the end of a line. Although I've still not seen evidence that this causes a problem for DWARF tools in the wild, it seems prudent to avoid this for now.

@RReverser
Copy link
Contributor

FWIW I agree - this PR with latest changes is resulting in correct locations for all mentioned cases, so as far as I'm concerned the issue is resolved. Thanks @ecstatic-morse!

My suggestion on ways to clean this up further like above still applies, but refactoring MIR locations is generally a bigger task and can be certainly out of scope of this issue/PR.

@michaelwoerister
Copy link
Member

I put it on my todo list.

Copy link
Member

@michaelwoerister michaelwoerister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, @ecstatic-morse! It looks good to me in general (nice test!) but maybe we can clarify the comment in from_span.

// x|yz = 1
// xy|z = 2
//
// See discussion in https://github.com/rust-lang/rust/issues/65437
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand what this comment means. Maybe we can try to make it clearer to the casual reader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How's this? Or is it the table specifically that's confusing?

// Rust likes to emit zero-width spans that point just after a closing brace to denote e.g.
// implicit return from a function. Instead of mapping zero-width spans to the column at
// `span.lo` like we do normally, map them to the column immediately before the span. This
// ensures that we point to a closing brace in the common case, and that debuginfo doesn't
// point past the end of a line. A zero-width span at the very start of the line gets
// mapped to `0`, which is used to represent "no column information" in DWARF. For example,
//
//   Span len = 0           Span len = 1
//
//   |xyz => 0 (None)        |x|yz => 1
//   x|yz => 1               x|y|z => 2
//
// See discussion in https://github.com/rust-lang/rust/issues/65437 for more info.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any documentation on when zero-width spans are emitted and for what reason? Or is this reverse-engineering the current (unspecified) behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second one XD

Copy link
Contributor Author

@ecstatic-morse ecstatic-morse Oct 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the code responsible (added in #37310):

// Attribute epilogue to function's closing brace
let fn_end = span.shrink_to_hi();

@bors
Copy link
Contributor

bors commented Nov 1, 2019

☔ The latest upstream changes (presumably #65718) made this pull request unmergeable. Please resolve the merge conflicts.

@michaelwoerister
Copy link
Member

I'll give this a closer look next week.

@ecstatic-morse
Copy link
Contributor Author

ecstatic-morse commented Nov 4, 2019

Rebased to fix merge conflicts and squashed all commits that modified only the test.

@michaelwoerister
Copy link
Member

Alright, I've given this some more thought and my conclusions are:

  • 1-based column indices are what we want because other tools produce them too
  • On the one hand I don't like that this PR is kind of reverse engineering unspecified behavior of MIR construction. On the other hand, that approach is not so different from how source location handling has always worked in debuginfo, so it's not much of a regression. And I personally don't have any time allocated to looking into a more principled approach.
  • Since we are relying on poorly specified behavior, we need good tests, especially around the corner cases.

The PR already adds a codegen test and I think it should extended to check for

  • nested scopes
  • closures with and without braces around the body
  • functions with explicit return statements
  • (other cases?)

Reading these tests is hard, so adding lots of explanatory comments is a good idea.

@ecstatic-morse
Copy link
Contributor Author

ecstatic-morse commented Nov 5, 2019

I agree that more tests is better in the abstract, but I'm confused as to why this PR demands them?

Currently, all column-info is off-by-one, with the exception of column-info for zero-width spans at the end of a function, which happen to be correct due to a second, offsetting off-by-one error (though it's not really an error I suppose, it's just a bit strange). All this PR does is fix the first off-by-one error, and mitigate the second. The worst thing that can happen as a result of this PR is we generate a bad column number (which we already do, everywhere).

I can add a function with an explicit return statement and closures to the test, but that won't be testing anything in this PR, it will be testing the spans we generate. Those tests can be written without using LLVM's FileCheck, which is brittle and hard to read.

@michaelwoerister
Copy link
Member

I agree that more tests is better in the abstract, but I'm confused as to why this PR demands them?

The PR adds some non-intuitive logic via the from_span method. It is making assumptions about the spans that MIR construction generates and those assumptions should be tested, otherwise things can regress without anyone noticing before the regression lands.

I can add a function with an explicit return statement and closures to the test, but that won't be testing anything in this PR, it will be testing the spans we generate.

My thinking is that we should test that the spans we generate early in the pipeline end up as the spans we expect in debuginfo. As far as I know we don't have tests for column numbers, so a PR that proposes to fix them should add regression tests.

Those tests can be written without using LLVM's FileCheck, which is brittle and hard to read.

I'm open to any method of testing!

@ecstatic-morse
Copy link
Contributor Author

Very well. I don't expect to pick this up again anytime soon, so if anyone wants to take this across the finish line (or try a different approach entirely), I would be greatly appreciate it.

@est31
Copy link
Member

est31 commented Nov 6, 2019

@michaelwoerister what about a smple change that adds 1 unconditionally? That shouldn't be controversial and was my initial proposal for the first step.

@Dylan-DPC-zz
Copy link

Dylan-DPC-zz commented Nov 12, 2019

@ecstatic-morse I'm closing this based on the comment in #65676. Thanks for taking the time to contribute :)

@Dylan-DPC-zz Dylan-DPC-zz added S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Off-by-one in generated DWARF columns
7 participants