LSP SemanticTokens offset incorrect in some cases #2715

jnt0r · 2023-06-25T06:55:33Z

The offset in SemanticTokens response is slightly off in some cases. I debugged this and found out, that the wrong offset is calculated deeper in xtext. The TextRegion is already filled with the wrong offset, which is then used to calculate the line and character in the SemanticTokensService. I don't know and yet didn't find where the TextRegion is created and filled with the offset.

It looks like line ending characters (\n) are counted in the offset as well:

cdietrich · 2023-06-25T08:01:23Z

Xtext offsets are for the complete document and this include whitespace

so the question is if they are correctly translated to line and column for lsp

a sample grammar and unit test might be helpful ( for the general offset of Xtext INode as well for the lsp Problem )

see e.g. org.eclipse.xtext.nodemodel.impl.AbstractNode.getTextRegion()

jnt0r · 2023-06-25T18:33:11Z

I try to take a look into this in the next days.

jnt0r · 2023-06-26T10:25:53Z

I think adjusting the way of calculating offsets is very complicated, as it is used widely in xtext and lays deeply in the core. Therefore, I am currently taking a look at the PositionReader that is returning an LSP4j Position based on an InputStream and an offset. I try to adjust it, so it ignores newlines.

cdietrich · 2023-06-26T14:31:56Z

yes i assume the implementor of PositionReader simply overlooked this issue

jnt0r · 2023-06-28T10:53:10Z

Update: The Problem lays in the underlying LineNumberReader which is used in the PositionReader. The LineNumberReader skips \r characters and does not count them as a character when reading.

When the offset is calculated, it includes \r characters. Therefore, we get a wrong position, as these characters are skipped when resolving the position to this offset.

As the LineNumberReader is part of the java.io package, I try to work around this behavior in the PositionReader by detecting if carriage return characters are used and then subtracting the line number from the calculated column. Still not 100% happy with my solution, so will think about a bit more.

szarnekow · 2023-06-28T13:05:31Z

Can you provide an isolated test case that illustrates what you are observing?
Something like

var reader = new PositionReader("some string literal");
reader.read(..)
...
assertEquals(.. , reader.getPosiiton())

?

jnt0r · 2023-06-30T17:25:31Z

I created this branch in my fork: https://github.com/jnt0r/xtext/tree/positionreader-demo-test with this test main...jnt0r:xtext:positionreader-demo-test

I'm not sure if we can assume that all line breaks contain carriage returns, or if there can be a mix of single \n and \r\n.
Normally, this should be specific to the OS and therefore consistent unless somebody manipulates the file.
My TestCase currently contains a mix of \n and \r\n.

jnt0r · 2023-07-06T17:22:46Z

I just created a PR with my current solution.

jnt0r · 2023-07-08T08:13:38Z

I closed my PR in favor of #2735

The previous implementation of PositionReader extended LineNumberReader. Unfortunately, the implementation of PositionReader.read() compresses line endings. In other words, both "\n" and "\r\n" contribute one character to the document offset count. This behavior is undesired, and can lead to the production of incorrect line and column offsets. This commit changes PositionReader to extend BufferedReader instead. Now, calling read() always reads exactly one character. We also implement logic specific to PositionReader to correctly handle "\r\n", "\n" and "\r" line endings. The PositionReader class now also keeps track of both the line and column values internally. The mark() and reset() functions have been updated accordingly. Signed-off-by: Joao Ferreira <Joao.Ferreira@avaloq.com>

jnt0r · 2023-07-16T07:36:21Z

Fixed by #2735

cdietrich added the help_wanted label Jun 25, 2023

jnt0r mentioned this issue Jul 6, 2023

Fix PositionReader wrong position on files with carriage returns #2734

Closed

jnt0r closed this as completed Jul 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LSP SemanticTokens offset incorrect in some cases #2715

LSP SemanticTokens offset incorrect in some cases #2715

jnt0r commented Jun 25, 2023

cdietrich commented Jun 25, 2023 •

edited

Loading

jnt0r commented Jun 25, 2023

jnt0r commented Jun 26, 2023

cdietrich commented Jun 26, 2023

jnt0r commented Jun 28, 2023

szarnekow commented Jun 28, 2023

jnt0r commented Jun 30, 2023

jnt0r commented Jul 6, 2023

jnt0r commented Jul 8, 2023

jnt0r commented Jul 16, 2023

LSP SemanticTokens offset incorrect in some cases #2715

LSP SemanticTokens offset incorrect in some cases #2715

Comments

jnt0r commented Jun 25, 2023

cdietrich commented Jun 25, 2023 • edited Loading

jnt0r commented Jun 25, 2023

jnt0r commented Jun 26, 2023

cdietrich commented Jun 26, 2023

jnt0r commented Jun 28, 2023

szarnekow commented Jun 28, 2023

jnt0r commented Jun 30, 2023

jnt0r commented Jul 6, 2023

jnt0r commented Jul 8, 2023

jnt0r commented Jul 16, 2023

cdietrich commented Jun 25, 2023 •

edited

Loading