LanguageServer.jl uses characters for position rather than UTF-16 codeunits

As per the quote in #400:

>A position inside a document (see Position definition below) is expressed as a zero-based line and character offset. The offsets are based on a UTF-16 string representation. So a string of the form a𐐀b the character offset of the character a is 0, the character offset of 𐐀 is 1 and the character offset of b is 3 since 𐐀 is represented using two code units in UTF-16.

The character offset should be in terms of UTF-16 codeunits. As far as
I can tell, LanguageServer.jl only uses UTF-8 internally and works in
terms of characters (codepoints) rather than codeunits. [eglot works around
this](https://github.com/joaotavora/eglot/blob/260f152634df2ba84ef3e51bdfd4f90a20babd9b/eglot.el#L1007)
but not all editors might. I have no idea how VSCode behaves.

So for e.g. the file:

```julia
𐐀𐐀𐐀="hello"
𐐀𐐀𐐀𐐀=𐐀𐐀𐐀
```

Asking for line 1 position 6 should show the hover for `𐐀𐐀𐐀𐐀` since
the 7th UTF-16 codeunit is still within that variable. Instead it
shows the hover for `𐐀𐐀𐐀`:

```
client-request (id:105) Wed Oct 16 16:29:55 2019:
(:jsonrpc "2.0" :id 105 :method "textDocument/hover" :params
          (:textDocument
           (:uri "file:///home/adam/tmp/test.jl")
           :position
           (:line 1 :character 6)))

server-reply (id:105) Wed Oct 16 16:29:55 2019:
(:id 105 :jsonrpc "2.0" :result
     (:contents
      [(:language "julia" :value "𐐀𐐀𐐀 = \"hello\"")]))
```

There's some discussion about the awkwardness of using UTF-16 code units at https://github.com/microsoft/language-server-protocol/issues/376 and a survey of other implementations at https://github.com/Avi-D-coder/lsp-range-unit-survey.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LanguageServer.jl uses characters for position rather than UTF-16 codeunits #401

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

LanguageServer.jl uses characters for position rather than UTF-16 codeunits #401

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions