Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Source code with tabs did not highlight correctly #25

Open
legendiguess opened this issue Jan 19, 2020 · 17 comments
Open

Source code with tabs did not highlight correctly #25

legendiguess opened this issue Jan 19, 2020 · 17 comments
Labels
C-bug Category: bug
Milestone

Comments

@legendiguess
Copy link

Here some example Slice where t symbol in source code have one space behind:
image

That Slice give us that output:
image

Now i change space behind t from space to tab(which still one symbol, so range of highlight should not be changed):
image
And highlight in output is wrong now:
image

@legendiguess legendiguess changed the title Tabs in source code did not highlight correctly Source code with tabs did not highlight correctly Jan 19, 2020
@brendanzab
Copy link
Member

brendanzab commented Feb 18, 2020

Yeah, I think annotate snippets should replace tab characters with a user configurable number of spaces, and take it into account when computing underlines. This is what I do in codespan at least!

@zbraniecki
Copy link
Contributor

agree! I am swamped a bit these days but will try to debug it! PRs welcomed!

1 similar comment
@zbraniecki
Copy link
Contributor

agree! I am swamped a bit these days but will try to debug it! PRs welcomed!

@zbraniecki
Copy link
Contributor

@legendiguess - can you retest against master? #27 got merged and I can't reproduce it anymore locally!

@zbraniecki zbraniecki added the C-bug Category: bug label Mar 27, 2020
@legendiguess
Copy link
Author

@zbraniecki
Just retested it - problem still here:
изображение

Here gist with minimal reproducible example https://gist.github.com/legendiguess/ccb197dac4bc77e3ae7571befc8ffbe2

@zbraniecki
Copy link
Contributor

Ah, I see. Now I understand the issue.

@brendanzab 's solution is one option.

The other would be to only adjust the underline calculation to account for tabs in line, but I'm not sure if display of "\tt" is always 5 ascii characters.

@legendiguess
Copy link
Author

The other would be to only adjust the underline calculation to account for tabs in line, but I'm not sure if display of "\tt" is always 5 ascii characters.

\t move next characters to the next tabulation column. so basically it's adds from 1 to 4 ascii characters, dependent on caret position off last tabulation column

@zbraniecki
Copy link
Contributor

Yeah, so it may be tricky to predict. Wondering how rustc handles that in error reporting now.

@legendiguess
Copy link
Author

Yeah, so it may be tricky to predict. Wondering how rustc handles that in error reporting now.

https://github.com/rust-lang/rust/blob/5e8897b7b51636f157630e6639b711d698e1d101/src/librustc_errors/styled_buffer.rs#L16

They don't handle \t in a fully "proper way"

@brendanzab
Copy link
Member

brendanzab commented Mar 27, 2020

@brendanzab
Copy link
Member

Oh, and importantly I also measure the length of strings taking the tab width into account: https://github.com/brendanzab/codespan/blob/a77e646944765dfb338185b0ba9504500afb7304/codespan-reporting/src/term/config.rs#L35-L45

@brendanzab
Copy link
Member

@brendanzab
Copy link
Member

\t move next characters to the next tabulation column. so basically it's adds from 1 to 4 ascii characters, dependent on caret position off last tabulation column

Oh wait, I understand what you're saying now, @legendiguess - yes I don't handle this properly in codespan-reporting either 🤦‍♂

@tisonkun
Copy link

#58 can be a duplicate.

@tisonkun
Copy link

tisonkun commented Jun 26, 2023

a user configurable number of spaces

When I dig into this issue yesterday, it turns out to be quite subtle even the expand POSIX command can produce "unexpected" result.

That said, leading \t (only) is a easy case, while tab in the medium can be quite complex, especially mixed with other whitespace. Generally, tab means making table according to tabstop so you need to translate the tab into whitespace to ensure that the whitespace width are aligned at the tabstop. But it's very unclear that even expand handles \t and \t differently within one tabstop slot.

Concretely, with tabstop=4, \t is translate to four spaces while \t is translated to five spaces. I suspect it's because is treated either as part of the whitespace or part of the next "column".

Generally, users are aware of the "final" result displayed on their screen, and don't care what the whitespaces there are. So, I guess there is no consensus on how a tab should be formatted "properly". We can make our rule, while it can still be different to what users "expected".

@epage epage added this to the Rust Adoption milestone Mar 13, 2024
@estebank
Copy link

estebank commented Jul 3, 2024

For the record: in rustc we replace all \t unconditionally with 4 (we don't account for tabstops), and treat them as 4 chars for the purposes of underlines/span positioning, and so far it has worked better than it has any right to. No-one has complained that the tabstops are ignored, and we don't get any obvious positioning bugs.

@nicolasbarbier
Copy link

so far it has worked better than it has any right to

I only care about things aligning on my screen, not about the exact indentation width. I never use tabs except for indentation. I guess it's the same for most people.

This solution definitely gives the best bang for the (development effort) buck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: bug
Projects
None yet
Development

No branches or pull requests

7 participants