Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSP SemanticTokens offset incorrect in some cases #2715

Closed
jnt0r opened this issue Jun 25, 2023 · 10 comments
Closed

LSP SemanticTokens offset incorrect in some cases #2715

jnt0r opened this issue Jun 25, 2023 · 10 comments

Comments

@jnt0r
Copy link
Contributor

jnt0r commented Jun 25, 2023

The offset in SemanticTokens response is slightly off in some cases. I debugged this and found out, that the wrong offset is calculated deeper in xtext. The TextRegion is already filled with the wrong offset, which is then used to calculate the line and character in the SemanticTokensService. I don't know and yet didn't find where the TextRegion is created and filled with the offset.

It looks like line ending characters (\n) are counted in the offset as well:

grafik

grafik

grafik

@cdietrich
Copy link
Member

cdietrich commented Jun 25, 2023

Xtext offsets are for the complete document and this include whitespace

so the question is if they are correctly translated to line and column for lsp

a sample grammar and unit test might be helpful ( for the general offset of Xtext INode as well for the lsp Problem )

see e.g. org.eclipse.xtext.nodemodel.impl.AbstractNode.getTextRegion()

@jnt0r
Copy link
Contributor Author

jnt0r commented Jun 25, 2023

I try to take a look into this in the next days.

@jnt0r
Copy link
Contributor Author

jnt0r commented Jun 26, 2023

I think adjusting the way of calculating offsets is very complicated, as it is used widely in xtext and lays deeply in the core. Therefore, I am currently taking a look at the PositionReader that is returning an LSP4j Position based on an InputStream and an offset. I try to adjust it, so it ignores newlines.

@cdietrich
Copy link
Member

yes i assume the implementor of PositionReader simply overlooked this issue

@jnt0r
Copy link
Contributor Author

jnt0r commented Jun 28, 2023

Update: The Problem lays in the underlying LineNumberReader which is used in the PositionReader. The LineNumberReader skips \r characters and does not count them as a character when reading.

grafik

When the offset is calculated, it includes \r characters. Therefore, we get a wrong position, as these characters are skipped when resolving the position to this offset.

As the LineNumberReader is part of the java.io package, I try to work around this behavior in the PositionReader by detecting if carriage return characters are used and then subtracting the line number from the calculated column. Still not 100% happy with my solution, so will think about a bit more.

@szarnekow
Copy link
Contributor

Can you provide an isolated test case that illustrates what you are observing?
Something like

var reader = new PositionReader("some string literal");
reader.read(..)
...
assertEquals(.. , reader.getPosiiton())

?

@jnt0r
Copy link
Contributor Author

jnt0r commented Jun 30, 2023

I created this branch in my fork: https://github.com/jnt0r/xtext/tree/positionreader-demo-test with this test main...jnt0r:xtext:positionreader-demo-test

I'm not sure if we can assume that all line breaks contain carriage returns, or if there can be a mix of single \n and \r\n.
Normally, this should be specific to the OS and therefore consistent unless somebody manipulates the file.
My TestCase currently contains a mix of \n and \r\n.

@jnt0r
Copy link
Contributor Author

jnt0r commented Jul 6, 2023

I just created a PR with my current solution.

@jnt0r
Copy link
Contributor Author

jnt0r commented Jul 8, 2023

I closed my PR in favor of #2735

szarnekow pushed a commit that referenced this issue Jul 11, 2023
The previous implementation of PositionReader extended LineNumberReader.
Unfortunately, the implementation of PositionReader.read() compresses
line endings. In other words, both "\n" and "\r\n" contribute one
character to the document offset count. This behavior is undesired, and
can lead to the production of incorrect line and column offsets.

This commit changes PositionReader to extend BufferedReader instead.
Now, calling read() always reads exactly one character. We also
implement logic specific to PositionReader to correctly handle "\r\n",
"\n" and "\r" line endings.

The PositionReader class now also keeps track of both the line and
column values internally. The mark() and reset() functions have been
updated accordingly.

Signed-off-by: Joao Ferreira <Joao.Ferreira@avaloq.com>
@jnt0r
Copy link
Contributor Author

jnt0r commented Jul 16, 2023

Fixed by #2735

@jnt0r jnt0r closed this as completed Jul 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants