-
Notifications
You must be signed in to change notification settings - Fork 30k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[grammars] provide alternative to TextMate grammars #216
Comments
@aeschli I'm interested in the thinking behind moving away from our own tokenization in favor of tmLanguages. |
@Tyriar For performance reasons we want the tokenizers to run in the render process. As we don't want user code to run in the render process we went for declarative tokenizers. We are aware of the limitations and problems of TextMate, and we are open to allow other types of tokenizers, but no work is planned in this area at the moment. |
That's a shame. It's also slightly "unfair". In the sense that it means that Microsoft can write language modes that can do things that other people can't.... There would be zero chance of a pull request to the core of vscode being accepted for a new language mode, so we are left being unable to write a sensible language mode for vscode. We have no problem in Eclipse, there's no problem in writing similar things there, they have no issue with things running in the render thread. At least in Monarch there was a "pop state" facility, which as far as I know has no equivalent in TextMate. In Monarch you could shift states to an explicit state, and then "pop". So you could write "subroutines". The facility made it least possible to write our language mode in Monarch, even if it was much harder work than doing it the low level way, which is what we ended up doing. |
Tokenizing languages where a single token might be split onto multiple lines is near impossible (without very complicated workarounds) using TextMate. (see microsoft/vscode-textmate#32) |
Hello everyone. I've developed and published syntax highlighting extension based on Tree-Sitter. It provides universal syntax coloring engine for almost any programming language (currently, C and C++ are supported OOTB). Constructing entire syntax tree, Tree-sitter efficiently overcomes all limitations of built-in TextMate grammars. It's very easy to add support for a new language. I'm planning to write HowTo in the next couple of days, but you can figure it out from source code, that is very simple and straightforward. Contributions are welcome. I've been using it by myself for a month, so I suppose it's ready for public use. At least extension can be useful until VSCode core provides stronger syntax parser. You can install it from VSCode Marketplace. |
@aeschli This was 5½ years ago 👆🏼 I understand TextMate is probably as much as a thorn in your side as for extension authors judging by some of the @mjbvz's logged issues. I'm happy to document its pain points but I imagine you already have a query in your GitHub Issues Notebooks somewhere. Here's my understanding of the current state of things:
After a week of struggling with microsoft/vscode-textmate#32 and practically nonexistent documentation apart from a blog post from 2014 ... could we pretty please with a cherry on top have an update on this issue? |
Continued from microsoft/vscode-textmate#117 (comment):
If it works for you that's great. For me most of your use case is solved by using YAML instead of JSON (like Sublime but it's frustrating that there's a compile step for Code) and the metaprogramming facilities of embedding match content in scope names (using them like CSS classes to inject other grammars) or YAML 1.1 merge keys. How YAML looks (syntax highlighting available for embedded regexes): scopeName: inline.template-fsharp-highlight.reinjection
injectionSelector: "L:meta.embedded"
patterns:
- name: string.quoted.triple.fsharp.template.fsharp.substitution
contentName: meta.template.expression.fsharp
begin: |
(?x) # Ignore whitespace
(?<!\{) # Not after brace
\{ # Literal brace
(?!\{) # Not before brace
end: |
(?x)
(?<!})
}
(?!})
captures:
0: { name: keyword.symbol.fsharp }
patterns:
- include: source.fsharp I've seen at least 2 projects that rolled their own grammar generators (the original Reason syntax and your own Better Shell Syntax). There's even a more interesting compiler (currently with documentation, online REPL, and CLI but no extension yet) to transpile an entirely new syntax with a Sublime-like stacking context into TextMate. But to me it's all infuriating. Code is progressive in so many ways but not only regressive in a core component of literally any text editor but now unresponsive about it. The https://github.com/microsoft/vscode-textmate project is on 1 hand daunting and on the other janky and indiscernible whether it's due to TextMate's unspecified behavior or actually a bug. Semantic tokens were a foundational step but not a solution. Most of the implementations connect them to their LSP. That's less performant than using Tree-sitter (#50140), leaves the burden on extension authors to provide a TextMate grammar when the LSP isn't available (like for a file outside a .NET project), and creates an inconsistent experience for end users in terms of coloring (whitespace significant languages like F# are most drastically affected) as well as responsiveness. In conclusion being silent about this hurts:
@bpasero @egamma Sorry to spam you but would a separate PR proposal for #50140 be more productive 🙏🏼 |
@aeschli do you know if there’s any current exploration into something like treesitter as a replacement for the textmate grammars we have today? It can’t just be left as it is indefinitely as it’s noticeable.
This is very true and it’s actually quite sad to see too. It’s great VSCode has all of these fancy bells and whistles and more features than you can possibly need, but it seems to get the basics wrong when it comes to rendering the source code onto the screen. On typescript projects I see the syntax highlighting kick in a few second after the code shows up, this is a known issue but was given lower priority. I’d probably go as far to say I’d happily wave any new feature for a few months if it meant time was spent on this. I understand there’s also a desire to fully rely on LSP for code colouring, but this just adds extra latency like @texastoland mentioned above; you would definitely need some level of caching or stale-while-revalidate before falling back to LSP otherwise it’s no better than what we have today. I understand anything around tokenisation requires a refactor and that’s most likely why no one wants to go near it but how long can that last really? Until competition begins to narrow? his last response to that thread was almost 3 years ago so I think it’s a dead end. It’s yet another thread where the maintainers have gone silent on the issue. I did include tree-sitter in my post around VSCode performance as a whole https://jason-williams.co.uk/speeding-up-vscode-extensions-in-2022 |
I've had a go at integrating a different service (alongside the textmate one) which supports tree-sitter. So far it loads up fine but there's some issues having it properly instantiate tree-sitter. I think this is to do with the security policies in place. I think its possible to have a Tree Sitter Service which can emit tokens (similar to the textmate service) and have higher-up services use that instead. Or have them use the tree sitter API wrapped in a service (for queries etc) If anyone is interesting in helping there's a PR here: |
Switching to Zed today 🤦🏼♂️ Note: not a single reply from MS here and only 1 dismissive response in #50140 (comment) |
#161479 SAD |
* Updating CI/CD * Updating CI/CD and including SM * Updating Electron * Updating Actions * More Updates * More Updates * Updating CI/CD commenting out anything related to Electron * Updating CI/CD * Linux changes * Updating CI/CD
* Updating CI/CD * Updating CI/CD and including SM * Updating Electron * Updating Actions * More Updates * More Updates * Updating CI/CD commenting out anything related to Electron * Updating CI/CD * Linux changes * Updating CI/CD
* Updating CI/CD * Updating CI/CD and including SM * Updating Electron * Updating Actions * More Updates * More Updates * Updating CI/CD commenting out anything related to Electron * Updating CI/CD * Linux changes * Updating CI/CD
* Updating CI/CD * Updating CI/CD and including SM * Updating Electron * Updating Actions * More Updates * More Updates * Updating CI/CD commenting out anything related to Electron * Updating CI/CD * Linux changes * Updating CI/CD
* Updating CI/CD * Updating CI/CD and including SM * Updating Electron * Updating Actions * More Updates * More Updates * Updating CI/CD commenting out anything related to Electron * Updating CI/CD * Linux changes * Updating CI/CD
* Updating CI/CD * Updating CI/CD and including SM * Updating Electron * Updating Actions * More Updates * More Updates * Updating CI/CD commenting out anything related to Electron * Updating CI/CD * Linux changes * Updating CI/CD
* Updating CI/CD * Updating CI/CD and including SM * Updating Electron * Updating Actions * More Updates * More Updates * Updating CI/CD commenting out anything related to Electron * Updating CI/CD * Linux changes * Updating CI/CD
TextMate isn't sufficient for many languages.
We have been integrating in to the lower level, in the src/vs/languages directory and using Modes.IState and supports.TokenisationSupport. There needs to be a way of writing an extension that can do this, which at least currently there doesn't seem to be,
Thanks.
The text was updated successfully, but these errors were encountered: