Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document text document content request and update changelog #1994

Merged
merged 2 commits into from
Aug 7, 2024

Conversation

dbaeumer
Copy link
Member

@dbaeumer dbaeumer commented Aug 7, 2024

No description provided.

@dbaeumer dbaeumer enabled auto-merge (squash) August 7, 2024 17:06
@vs-code-engineering vs-code-engineering bot added this to the August 2024 milestone Aug 7, 2024
@dbaeumer dbaeumer merged commit aede136 into gh-pages Aug 7, 2024
2 checks passed
@dbaeumer dbaeumer deleted the dbaeumer/urban-raven-brown branch August 7, 2024 17:33
@DanTup
Copy link
Contributor

DanTup commented Aug 8, 2024

@dbaeumer I think I'm interested in this request (currently I have implemented custom requests for something similar), but there are two things that aren't clear to me:

  • does this only support a single scheme? A server can't register for multiple?
  • is there no way to indicate whether this scheme is read-only or not? (for example if it's generated code)

Although I've shipped custom requests for now, the feature is still highly experimental and I would love to switch to something that is standard LSP before it goes stable, but it needs to support read-only and I'm not sure we won't have additional schemes in the future.

Tnx!

@mfussenegger
Copy link

Servers like https://github.com/eclipse-jdtls/eclipse.jdt.ls also support opening *.class files. Currently it uses a java/classFileContents custom method for that.

Could this in addition support defining glob patterns a server can handle? (Or as replacement for the scheme)

@DanTup
Copy link
Contributor

DanTup commented Aug 8, 2024

Above I mentioned being readonly, but since this uses registerTextDocumentContentProvider I think it will be readonly (at least for VS Code). However, the spec changes don't suggest this to me:

Please note, that the content of any subsequent open notifications for the text document might differ from the returned content due to whitespace and line ending normalizations done on the client.

Sending a didOpen notification with different content to what the server provided sounds like telling the server the content has changed. I was not expecting any open notifications for these (in fact, I'm fairly sure they are not send today with my custom implementation which also uses registerTextDocumentContentProvider? 🤔)

Could this in addition support defining glob patterns a server can handle?

I think since this uses VS Code's registerTextDocumentContentProvider() API which only takes a scheme, that's not going to be possible.

(But, I do think multiple schemes should be supported by creating multiple providers, because this seems unnecessarily limited and may lead to sticking with non-standard implementations 😔)

@dbaeumer
Copy link
Member Author

dbaeumer commented Aug 9, 2024

Thanks for your feedback.

does this only support a single scheme? A server can't register for multiple?

Yes, a server can register for multiple scheme using dynamic registration. It felt strange to support an array of schemes but if users feel strongly about it I am open to add it.

Sending a didOpen notification with different content to what the server provided sounds like telling the server the content has changed.

Whether a didOpen is sent depends on whether the server has register with a document selector that matches the document that got opened. I am sure that this already happens with your code as well since VS Code itself emits that open notification.

The changed content can occur due to whitespace and line ending normalization. This can happen with any client and I think it is fair to point this out. A content change in case of typing would be signaled via a didChange event.

Glob patterns

I can add that to the spec behind a capability, but VS Code wouldn't implement it since it is not supported. So I would like to add something like this when we have a real client that supports it.

Read-Only

I suggest we add a client capability to the spec denoting if a client treats these document as readOnly or Writable. And I will turn the result of the request into a literal so that we can extend it later on in case we want to add a flag denoting readOnly in case the client treats them writable.

@dbaeumer
Copy link
Member Author

dbaeumer commented Aug 9, 2024

Regarding read-only. I will add to the spec that clients should treat the file as read-only since it is unclear what servers should do it the client tells the server it will treat it as writable.

If we want to support writable content we are better off to support a file system in the server.

@DanTup
Copy link
Contributor

DanTup commented Aug 9, 2024

Yes, a server can register for multiple scheme using dynamic registration. It felt strange to support an array of schemes but if users feel strongly about it I am open to add it.

I don't feel strongly, but it does feel a little odd to me that a server might only be able to register one scheme because a client doesn't support dynamic registration. A client might only support static registration because it's easier, without realising it has this restriction, and then a server has to decide what to do if it has two schemes but a client only supports static registration. It's potential grief for users that might ultimately require a request to their client to support dynamic registration (and the time in implementing and shipping that) for what seems like not a lot of benefit.

Whether a didOpen is sent depends on whether the server has register with a document selector that matches the document that got opened

Oh yes, I forgot about that - that's why I don't get them then :-)

The changed content can occur due to whitespace and line ending normalization. This can happen with any client and I think it is fair to point this out. A content change in case of typing would be signaled via a didChange event.

Hmm, actually this now sounds odd to me. I'm not at a machine to do any testing, but if the client normalises the content and the server is not getting didOpen requests (because it doesn't want them because this is readonly), are the offsets potentially going to be wrong because the client has a normalized version of the doc?

(related: I think this normalization is not well specified and VS Code's behaviour is awkward for servers - there's an open issue about that at #1853)

Regarding read-only. I will add to the spec that clients should treat the file as read-only
If we want to support writable content we are better off to support a file system in the server.

I agree with this 👍

@jwortmann
Copy link

Hello,
currently the result of the request only provides the content of the text document:

export interface TextDocumentContentResult {
	/**
	 * The text content of the text document. Please note, that the content of
	 * any subsequent open notifications for the text document might differ
	 * from the returned content due to whitespace and line ending
	 * normalizations done on the client
	 */
	text: string;
}

Am I right to assume that this "text document" is supposed to contain some form of source code?

If yes, then shouldn't the result also provide some kind of languageId, e.g. similar to TextDocumentItem, so that the client could apply syntax highlighting to this document?

@dbaeumer
Copy link
Member Author

@jwortmann the language is determined by the client. See the content provider as a simple file system, where the server can provide content for a text document. It is still up to the client which language is associated with the content (for example in VS Code language ids are contributable and the client might decide on these rules to pick another language id)

@jwortmann
Copy link

But how should the client do it then?

Perhaps I am misunderstanding the use case for this request. But my intuition from the client side would be the following: If the user clicks on a link with an unknown scheme (i.e. not file, http(s), etc.) in a hover popup, or if the URI of a Location/LocationLink has an unknown scheme, then the client would check whether the server has registered this scheme for workspace/textDocumentContent, and if this is the case it would send such a request to the server. So isn't it the server in this case, which knows what the language for the specified scheme is?

The only way I currently see is that the client could try to derive it from a file extension in the URI. But which language would you for example assign to the following URIs:

  • jdt:/path/to/MyClass.class
  • disassembled:/C:/path/to/lib.dll
  • res:/virtualdocuments/123

Also, if this is the intended way, it would basically imply that the languageId in TextDocumentItem would also be superfluous, because the server could derive it from the file extension in the URI too.

I don't know if these URIs are realistic examples, but if I misunderstand the purpose for this request like I described it above, could you please explain the use case in more detail? This PR seemingly appeared out of nowhere, and the specs only describe the method definitions but not what the purpose of "dynamically fetch the content of a text document" is.

@dbaeumer
Copy link
Member Author

This was a long outstanding feature request: #336

The request is for text document content only. Its purpose IS NOT to resolve the content of a http or ftp request. The language server protocol has not support for this and if we add the (at least for ftp) is would very likely be more like a file system provide than a content provider.

Even if a language server provides a content, it might not be the one providing the language id and its smarts. This is why the client decides on the language id and it is open in the spec how a client does this. VS Code has a registery to register language ids and uses AI (a language model) to determine the language id based on content.

The reason why the TextDocumentItem has a languageId property is exactly because the client determines the langue id and sends it on open to the server.

The example you have provided are releastic. For example for the first case you would register something in VS Code that tells it that a text document with the scheme jdt and the extension class has the languageId javaClassFile. It is very likely the extension that manages the language server that does that registration. Other clients might have other means to do that registration. This is why it is left up to the client.

@dbaeumer
Copy link
Member Author

@DanTup

Hmm, actually this now sounds odd to me. I'm not at a machine to do any testing, but if the client normalises the content and the server is not getting didOpen requests (because it doesn't want them because this is readonly), are the offsets potentially going to be wrong because the client has a normalized version of the doc?

It behaves exactly the same way as if the file exists on disk. When a file opens a open notification is sent to the server (if interested) and tells the server that the ownership of the content has passed to the client and the server is not support to read the content from disk. With the content provider it is the same: the open notification tells the server that the ownership of the content has passed to the client and the server can't trust its own content anymore. So as long as you have not received an open notification all offsets are correct. If you have received a open notifications the offsets must be based off the content that got passed to you in the open notification.

I don't feel strongly, but it does feel a little odd to me that a server might only be able to register one scheme because a client doesn't support dynamic registration.

I will add support for a array of schemes.

@DanTup
Copy link
Contributor

DanTup commented Aug 19, 2024

When a file opens a open notification is sent to the server (if interested) and tells the server that the ownership of the content has passed to the client and the server is not support to read the content from disk. With the content provider it is the same: the open notification tells the server that the ownership of the content has passed to the client and the server can't trust its own content anymore. So as long as you have not received an open notification all offsets are correct. If you have received a open notifications the offsets must be based off the content that got passed to you in the open notification.

So what if I register scheme foo:// to provide content, and I register to handle hover for it, but I do not subscribe for open events. When the user opens the file and hovers something, will I get the wrong offsets if the client normalization changed the line endings because I don't have a copy of the normalised content?

If so, it seems like handling requests for any document you don't accept open notifications for could be problematic (and a difficult to track down bug). It also feels slightly odd to accept overlays for read-only files like this. I kinda feel like client should never normalize read-only content?

I will add support for a array of schemes.

Thanks, I think that is for the better :)

@dbaeumer
Copy link
Member Author

So what if I register scheme foo:// to provide content, and I register to handle hover for it, but I do not subscribe for open events. When the user opens the file and hovers something, will I get the wrong offsets if the client normalization changed the line endings because I don't have a copy of the normalised content?

Yes, and this also happens for all other schemes as well. If you provide hover for a file scheme and you don't register for open events for it you might use the wrong offsets.

@DanTup
Copy link
Contributor

DanTup commented Aug 20, 2024

This really doesn't feel right for readonly-files. These are files that the server is generating and updating the contents of as other files change. They are explicitly readonly and the server is explicitly the source of truth. If the client is allowed to give the server a modified overlay, when the server regenerates files because something changes, there are now two sources of truth and it's not clear which one the server should be using to produce results.

What is the reason for a client being allowed to modify read-only files?

@DanTup
Copy link
Contributor

DanTup commented Aug 20, 2024

@dbaeumer I'm looking through the spec again to figure out if we can generate files in a way that would guarantee the client does not modify them (and thus not have to worry about mutated overlays). I found there is a capability that says whether a client will normalize line endings, but it does not provide any details about what this means - it's something I raised before at #1853

Specifically, it's not clear to me whether it means "The client will change all line endings to the majority line ending in the file", or "The client may take a file that is 100% \n and change it to 100% \r\n". If it's the latter, then there's no way I could produce a file that I can be sure the client will not mutate. If it's the former, then all I need to do is make sure my line endings are consistent.

I know VS Code just uses the must common line ending in the file, but that is not defined here by the spec. It would be nice if the server can know for sure the behaviour (or, be provided with the line ending the client will normalize to).

@dbaeumer
Copy link
Member Author

The truth is always the one that comes with the open notification. This is how the spec works today and the spec doesn't forbid any modification of file content in the open notification. To implement what you are suggesting we would need to change the spec to say that read only files must not be changed and that no open notification should be sent for them unless they are made writable.

@DanTup
Copy link
Contributor

DanTup commented Aug 20, 2024

To implement what you are suggesting we would need to change the spec to say that read only files must not be changed and that no open notification should be sent for them unless they are made writable.

I feel like that would be best, but I don't know if that's a big deal (because I don't know if it means changing VS Code).

Having a clearer specification of what normalization the client may do would be a good second though - because then the server could ensure it generates content that would not be mutated. Forcing a server to track client-provided versions of documents that it is generated feels like a lot of added complexity, because the server needs to track two versions of a document - something it does not do for normal files, because it only reads them either from the clients overlay, or from the disk.

@szarnekow
Copy link
Contributor

When the user opens the file and hovers something, will I get the wrong offsets

Aren't ranges and positions agnostic to line endings? So even if the client normalizes at will, the positions would still be the same, no?

@DanTup
Copy link
Contributor

DanTup commented Aug 20, 2024

Aren't ranges and positions agnostic to line endings? So even if the client normalizes at will, the positions would still be the same, no?

Good point! Dart uses offsets internally so I forgot about this. I had a scan through the LSP spec and there aren't many places using length for anything, so maybe this is a non-issue (or at least, much less of an issue) if we just ignore it. Thanks :)

(although FWIW, I still think the idea of clients sending mutations for server-provided read-only content is weird 🙃)

@dbaeumer
Copy link
Member Author

@szarnekow good point! I should have remembered that ;-)

I feel like that would be best, but I don't know if that's a big deal (because I don't know if it means changing VS Code).

To not break anything something like that would end behind a capability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants