Skip to content

LanguageServer.jl uses characters for position rather than UTF-16 codeunits #401

@non-Jedi

Description

@non-Jedi

As per the quote in #400:

A position inside a document (see Position definition below) is expressed as a zero-based line and character offset. The offsets are based on a UTF-16 string representation. So a string of the form a𐐀b the character offset of the character a is 0, the character offset of 𐐀 is 1 and the character offset of b is 3 since 𐐀 is represented using two code units in UTF-16.

The character offset should be in terms of UTF-16 codeunits. As far as
I can tell, LanguageServer.jl only uses UTF-8 internally and works in
terms of characters (codepoints) rather than codeunits. eglot works around
this

but not all editors might. I have no idea how VSCode behaves.

So for e.g. the file:

𐐀𐐀𐐀="hello"
𐐀𐐀𐐀𐐀=𐐀𐐀𐐀

Asking for line 1 position 6 should show the hover for 𐐀𐐀𐐀𐐀 since
the 7th UTF-16 codeunit is still within that variable. Instead it
shows the hover for 𐐀𐐀𐐀:

client-request (id:105) Wed Oct 16 16:29:55 2019:
(:jsonrpc "2.0" :id 105 :method "textDocument/hover" :params
          (:textDocument
           (:uri "file:///home/adam/tmp/test.jl")
           :position
           (:line 1 :character 6)))

server-reply (id:105) Wed Oct 16 16:29:55 2019:
(:id 105 :jsonrpc "2.0" :result
     (:contents
      [(:language "julia" :value "𐐀𐐀𐐀 = \"hello\"")]))

There's some discussion about the awkwardness of using UTF-16 code units at microsoft/language-server-protocol#376 and a survey of other implementations at https://github.com/Avi-D-coder/lsp-range-unit-survey.

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions