-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Environments for code fragments/buffers (i.e. elixirsense) #12645
Comments
Hi @josevalim! Thanks for including me on this. For now I don't have too much to add other than observe, but in the coming weeks I will be much more up to speed. Thanks again! |
It depends on how you classify things. Is app config a part of eval env? Is static analysis? ElixirSense can infer that
|
Anything that you conclude from looking outside of the current buffer (such as app config) is either part of the eval config (this is an environment that directly affects the |
Hi, Jose. Thank you for putting more effort into supporting LS. Let me try to understand your question and goal: so the main goal is:
And first
And after achieving it, it will make it easier for LS authors to implement completion and navigation functions. I have no objections to this overall goal. I believe it will indeed make our life easier. How does Lexical handle completion?Then I want to talk about how Lexical handles completion. And what problems we encountered when using A key feature of Lexical's completion is context awareness: for example, only modules are completed after
Currently, we use text = ~q(
defmodule MyModule do
&MyModule.fun|
end
)
env = new_env(text)
quoted =
env.document
|> Document.fragment(env.position)
|> Code.Fragment.container_cursor_to_quoted()
quoted |> dbg() The result is quoted #=> {:ok,
{:defmodule, [line: 1],
[{:__aliases__, [line: 1], [:MyModule]}, [do: {:__cursor__, [line: 2], []}]]}} We can't use that result to determine the cursor in a capture context, though it is. This is the problem we encountered in the first stage. As for the second stage, I found it difficult to know what was around my cursor just by using the current API. Taking This may also pose similar problems when implementing navigation. For example, if my cursor is in |
Exactly. This issue is about the second stage.
Every time you run into a scenario like this, please open up an issue. I will tackle the
Correct! That's what we want to solve with this issue in a way that is general and efficient.
This should be handled by |
The |
Cool, thanks; so fast.
Ok, I'll do that.
Ok, I think that part of the code is in identifier_matcher.ex#L362, I'll explore it when do the navigation task. |
This is an interesting approach, @josevalim. I really like the tree approach, and in order to keep things fast, it might help to review how the LSP protocol works, as I would like to avoid any impedance mismatches between the proposal and the LSP spec. Every time the user types a character, a If I'm understanding you correctly, we could couple this new data structure to a document when it's opened by the user, so we'd have both a textual representation (The Generally, I think this would be fine for us to do; however, edits can affect both large portions of a document (think cut and paste, and search and replace) or very small ones (every character typed or deleted by the user comes to the server as a text edit). I would think that over time, it might become necessary to occasionally collapse the edit history of the documents in order to save memory, as we don't really care much abut history when building language servers. One of the tougher problems I encountered before I started the lexical project was implementing defmodule Testing do
alias Foo.Bar.Baz
alias Baz.{Top.Bottom,
Bottom.Right
}
alias Right.Left
we'd able to say things like: and that would return: %{'Right' => 'Foo.Bar.Baz.Bottom.Right'
'Bottom' => 'Foo.Bar.Baz.Bottom',
'Baz' => 'Foo.Bar.Baz'
} If so, this would be a great help. We currently have to build this functionality ourselves. |
You got it right. Further context:
|
I sure can. Here's the "docs" for text edits. One thing to note (and that I've asked for before) is that text edits and their positions used zero-based lines and "characters", while almost all of elixir uses one-based lines and columns. I would love it if we could tell the compiler to be zero-based. I've relegated most of the conversions to framework code, but it'd be cool to not have to do this at all.
Check my work, but I don't think we can get away solely with lines, as this is valid in elixir alias Foo.Bar.{Baz, Baz.First, Baz.Second} the aliases for |
I am wondering if this is actually an Elixir bug? I am not sure if we should do left-to-right aliasing like that. EDIT: On the other hand, |
Apologies if I'm missing something, but would it be sufficient to store the line/col start and end on each node of the "tree of scopes" as its being constructed? Then answering questions about scope for an exact line/col can be done by drilling down to the leaf containing that line/col and collecting the path of nodes along the way. |
As far as I can tell, aliases have always supported this (which is just a special case of top-down aliases), but I didn't invent the language, so I don't know what the intention was. 😉 I'm not a fan of this kind of code, but it doesn't seem like a language bug to me. These are all equivalent: alias Foo.Bar.{Baz, Baz.First, Baz.Second} alias Foo.Bar.{
Baz,
Baz.First,
Baz.Second
} alias Foo.Bar.Baz
alias Baz.First
alias Baz.Second I'm pretty sure I've seen code like the above in the wild, so it might be too late to take it away. As an aside, my alias Foo.Bar.Baz
alias Foo.Bar.Baz.First
alias Foo.Bar.Baz.Second It worked too, but it was extremely complicated. |
@josevalim There is no bug here. @scohen this syntax is a bit simpler than what you described. There is actually no aliasing left to right inside
But yeah, there are a few other interesting cases with aliases (unaliasing, submodule aliases, |
err, mine (elixir 1.14.3) does differ between the two attempts
(no Baz) Those look different to me, but that's not the point I'm making here. The thing is that this is allowed, and it's important to know that I do think I see your point though:
A bit surprising. |
From looking at the examples, this does not seem the case. They are all based from the root |
So, I think lines will be sufficient then |
@josevalim I'm starting work on find references, and the first thing I've come across is context-aware knowledge of what exactly is under the cursor. For example, say I have the following defmodule ParentModule do
defstruct [:key, :value]
def function(%__MODULE__|{} = parent) do
...
end
end (the cursor is I've tried using I've noticed that both So a couple of questions:
Thanks! By the way, I've implemented an initial version of the environment that works with aliases, if you're interested. |
@scohen I think the ElixirForum is a good place to have these conversations. However, the answer is not going to be different from what I answered in the past: you will always need a combination of |
This is more a question of dueling implemtattions. |
It is up to you really. I won't be able to start on this immediately and, if this feature needs changes to the tokenizer/parser, it will only be available on Elixir v1.16. You can either:
|
One request that I have is that we make whatever solution more like Macro.Env and less like the map-based interface that iex uses. I find maps to be much too flexible of an API, and it's not clear what keys the maps have, or what values might be associated with those keys. |
I've begun working on
I will have a repo to share soon, please let me know @josevalim if you have any specific things in mind for it. Happy holidays everyone! |
The code will be important, of course, but I think the most important will be a test suite with many cases we would like to be automatically fixed. :) |
Roger! |
I have a fully compliant and error tolerant parser ready (obviously not including any bugs or syntax I haven't encountered yet) here: https://github.com/elixir-tools/spitfire. It has a test suite full of plenty of test cases, but it also tests against about 30+ popular Elixir projects on GitHub and can parse them identically to There is still room for more and better error tolerance, but I am now moving onto the next step, which is thinking of datastructure/algorithm for building the environment and thinking of the API for querying the data structure. The one thing for the future that I'm not sure how to accomplish yet is expanding macro, as they (particularly the @josevalim please let me know if you have any thoughts so far or concerns! I've already gone further than I originally planned before checking back in 😅. |
@mhanberg excellent job! My only concern is that you depend on Btw, have you had the chance of benchmarking the parser? Because a hand-written parser can lead to better error messages and custom fault-tolerance, but I worry it makes maintenance harder. But I wonder if it also leads to better performance.
May I propose something else as the next step? One of the things I have described above is that we need to track "blocks". For example, if you write: defmodule Foo do
def bar do
...
end
end If you do changes inside Building of the environment and querying of the data is most likely something that needs to exist in Elixir, because expanding the macros and collecting AST info will depend on private APIs, so we need to figure out the appropriate moment and API to hand-off those to Elixir. |
elixir_sense has a pretty comprehensive implementation that extracts info from AST though I admit the data model used there is far from optimal. Most of the test suite can be found in https://github.com/elixir-lsp/elixir_sense/blob/master/test/elixir_sense/core/metadata_builder_test.exs
ElixirLS, Lexical already vendors it as well as Sourceror library. There is a need for tokenizer as a library. It could also improve on error tolerance.
LSP editors send content modification messages https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocument_didChange, the server applies those messages and constructs the current version of the document. Those messages are on character ranges level so theoretically we could map it to token ranges and then to AST nodes.
elixir_sense has some limited support for macro expansion, see |
Thanks! 😄
I had considered it, because there are some changes that could be made to tokenize as it parses, rather than parse it all up front. I punted it because I wasn't sure what parts of this work would get upstreamed into core (the Spitfire parser itself, or one of the other components yet to be built, or none or all of it). If Spitfire or some section of it was in core, I would assume it would be able to use private API. I am open to anything tho!
No benchmarking yet, I've only observed parsing times for whole projects (includes tokenizing time). For example, it parses all 272 source files in elixir-lang/elixir (134,242 loc) in 265ms and 264 files in absinthe (24,753loc) in 17ms I'll do some actual benchmarking here soon
I was planning on punting this to later because I wasn't exactly sure how to accomplish it 😅. I'm open to moving to this part tho is that is what your gut is telling you. I had just assumed it was an optimization and opted to getting the "full stack" thing working first. I still need to do some research into how this is accomplished elsewhere, but the immediate thoughts/concerns I have are:
I guess it's just the one concern for now, actually.
Another callback to my previous comment about not knowing where the lines for inclusion into core should be called. As I was source diving more compiler internals last night, the thought did occur to me that perhaps to accomplish this, we are somewhat building another compiler frontend. Also while reading through LSPs in other languages source code, I saw that rust-analyzer's description is "A Rust compiler front-end for IDEs" It seems like eventually the steps may be similar to what exists, except it uses the error tolerant parser and after expanding everything it doesn't compile into Erlang and just leave it as a data structure and/or emit tracer events. I'm curious to your thoughts on that, but we can leave it for another time to not get off topic. Thank you José! |
Hrm, I think you are right. It may make sense to optimize it only later. Doing a single pass on the code and expanding all macros at once may be fine for the initial version. My only concern is that you will have to reverse engineer all of our changes to Macro.Env and, one of the concerns
@michalmuskala do you have practical suggestions on how to handle this? Do you simply bump everyone's line offset accordingly? |
I vaguely recall that rust-analyzer handles this by storing line/column information as relative instead of absolute values and then lazily rebuilding the absolute values as you traverse the structure. So a node contains only its own extents (e.g. 10 cols, 1 line, or for multiline nodes, 5 cols start, 3 lines, 9 cols end). This allows you to replace nodes without concern for updating offsets elsewhere. |
Hrm, relative to what? Maybe relative to constructs that start a new scope, such as a new function or a new conditional? |
I'm definitely missing something in my recollection, but I think you could do this by storing, for each node, the position relative to its parent's start line/column as well as its own extent. I'll try to find where I read about this. Edit: Nope, that wouldn't work because a node changing its extent would affect the relative position of all later siblings, so you have the same issue. Edit the 2nd: I believe this reference is where the idea came from. |
It still is but it's not that bad. The tokenizer is not a stable API and AST metadata changes with every release but the high level concepts they represent (modules, protocols, defs, structs, attributes, vars) haven't changed much since elixir ~1.3
Exactly. The tools end up reimplementing parts of the compiler (e.g. alias resolution, variable tracking, function call tracking, type inference) to be able to answer questions about high level concepts represented by code. And rust-analyzer isn't the only example of a compiler frontend serving as a base for LSP servers. MS did something similar with Roslyn for C#.
@zachallaun @josevalim I've seen something similar in LSP semantic tokens that editors can use for semantic highlighting. The idea is to store token position relative to the previous token. That way whenever a token is added, moved or removed, only this token or its neighbours are affected. Maybe this could be extended to syntax trees as well. If an edit is made in a block, it should not affect the inside of a next or previous block. We would need to store which tokens (or at least first and last token) generate a particular AST node. |
As far as I know, most of the interactive-first IDEs have a setup with effectively 2 AST data structures - this was popularised by Roslyn (the C# compiler) and is roughly similar for Rust Analyzer & ELP (through Rowan), and Swift, and perhaps others. I can talk more about how it looks like in RA & ELP. First, the AST only stores byte offsets. Storing line + column information is extremely expensive. Early in building ELP we noticed that large amount of memory of the LSP was spent just storing position information. In particular, in Erlang The first AST, and the one that is persisted, is built on "Green Nodes". Those nodes only store their own length in bytes & a list of child nodes. In particular you can cheaply replace a node in the "green tree" and you only need to update the spine of the tree. By storing only the size of the node itself (including the size of children) it naturally becomes "relative" and is not invalidated by unrelated changes. Storing the sizes also allows pretty quickly finding a right node given a position - starting from the root, you can do a binary search on each layer to get to the right node in However, this is not very practical for actual traversal, since position information is useful to have. To get it, during traversal you build a "red tree" that maintains proper position information from the beginning of the file. This "red node" acts, in a way, similar to a zipper - storing the current computed information + references to current, parent, and sibling green nodes. The red nodes are never stored/persisted, instead you store a |
Thank you for the proper explanation, @michalmuskala! This is indeed what I was remembering above. Harking back to the original premise of this issue, I think this would fit well into a |
Thank you @michalmuskala! One of the things that are unclear to me is if the structure above is used for all files or only the currently open buffers. I understanding storing line+column (especially as a keyword) can be expensive, but I think that we can treat a buffer and a compiled module differently. Especially because in the buffer case you need to know line by line which imports, aliases, and so on are available to show for completion and you don't need that for compiled modules. So, in your experience, storing line+column is expensive even for buffers? In any case, it is clear there is plenty of opportunity for experimentation and depending on the release cycle of Elixir will only serve to slow things down. So I think the best way Elixir can support this is by making sure we grow our compiler API surface enough to reduce the use of private APIs. Any sort of buffer analysis tool in Elixir needs to expand macros but also specially handle some nodes. For example, Therefore, Elixir definitely needs to provide a lower level API for expanding macros. We do have Then you also need to be able to handle all special forms. To be honest, that does not sound very complicated. Here is a rudimentary pass for the type system in less than 300LOC. It is basically an AST traversal where you must consider which parts of the special forms are patterns/guards and how each special form modifies the environment. The environment modification is another part that I worry about. Some of the fields in the environment are very straight-forward to manage, such as Does that sound like a viable first step to everyone? @lukaszsamson, is there anything |
I think exposing more compiler apis is a good next step 👍. |
@josevalim I think your last post has gotten me thinking. We've spent an inordinate amount of time answering the question of "Given a document, what does the environment look like at a given line and column", and have had to recapitulate a lot of what happens in the compiler and is surfaced via compilation tracers. In addition to what you've highlighted above, @spec env_at(document :: String.t(), line :: pos_integer(), column :: pos_integer()) :: Macro.Env.t() I also think it might be nice to somehow have this in a standalone library. While it's great to get changes into core elixir, it does mean that we have to do things two ways; one way for versions that don't support the new API and another way for versions that do. Lexical supports elixir > 1.13, and it will be years before it can transition to any new way of doing things post 1.16. |
@scohen, that's pretty much what we have been discussing. :) We want to have a similar API to what you propose for any active buffer (we don't want to do that for all modules because that would be very expensive and, for compiled modules, you only need to know the result (i.e. this function was imported from X but not all possible imports)). And I agree, this solution needs to be available outside of Elixir, for availability and development speed. I'd recommend you folks to discuss at some point what some shared solution would look like when it comes to global environment and buffer environment, as originally outlined in this issue. For now I will personally only focus on providing the basic environment and macro manipulation functions in Elixir to aid building the buffer environment. For the compiler environment, tracing is pretty much the way to go (we need to add new traces though related to Mix). :) I will submit a proposal for the new functions relatively soon. |
Just to clarify for everyone, I am actively working on this issue, and it lives in a 3rd party library called Spitfire. I am going to continue to harden the parser while José/me/whomever sets/creates a public API for more compiler features in core and when those become available, I will use them in Spitfire to produce more or less the function spec that scohen shared (which I more or less already had living on my computer when I cam back to this issue to share my progress, haha). In the end, I am very open to look at Spitfire and see if any parts (the parser, the API) can be upstreamed into core, but for now it sounds like keeping it in Spitfire is what we want in order to make faster progress. |
For RA and ELP there's generally no difference between the current file and other files - everything goes through the same API. In particular they don't rely on regular compiler to do analysis (other than some diagnostics triggered on-save). This generally isn't a big problem apart from macros, and makes the analysis significantly faster (and possible to do as-you-type, rather than on-save). Conceptually, once the LSP state is loaded, the entire state is held in the process and disk or external programs are not consulted for anything. In Erlang's case regular macros are relatively easy to re-implement on top of our syntax tree. For parse transforms we just don't support them, outside of limited number of standard parse transforms that we effectively re-implement to work on top of our syntax tree. For Rust it's somewhat more complex. For |
Perfect, thank you. I think it may be worth keeping them separate in Elixir, due to the amount of work it may be required to store precise imports, aliases, and variables based on macros based on position. I guess we will learn as we move forward. |
Yes, I understood that, my worry is that spitfire will then depend on what José comes up with, though I believe you're in the same boat as lexical and elixirLS, so upon further reflection, my concern might not be particularly warranted. I was actually wondering if we could have the Elixir changes as a separate library as well. My memory might be a little fuzzy, but I think I remember Maybe we can do something similar with the new compiler enhancements? I'd like to ship them sooner rather than later. |
What @lukaszsamson does in ElixirLS is to have a copy of the functionality inside ElixirLS. For example, if new functions are added to Macro.Env, he keeps a copy as ElixirLS.Macro.Env (for example) and remove it once it is all up to date. |
It does for me
It may be straight-forward to manage on the compiler end but mapping a cursor position to an env is not that easy. Due to macros it is a 1 to many relationship e.g. in this code the cursor env is module is both
Or what is the env module and current function for quoted? Looking at the code it's easy to tell the function is
@josevalim there are other things that are useful. Besides what you mentioned
A lot of the above are not things that the compiler is aware of. They are built by standard library macros but that macros do not provide necessary introspection features. Speaking of API, I'd see it similarly to @scohen 's proposal. Id change the return type to a list of possible envs and the first argument to some syntax tree (
but the column parameter is tricky. Consider this simple example. In which column can we say that variable
|
Exactly. None of this is compiler behaviour. The best way is probably for the language server tooling to intercept those macros instead of expanding them, and then add it to its own environment. |
Hi everyone, I have created #13361 for a minimal proposal on the basic building blocks. Outside of that, my current mindset is the following. This issue argues we need two contexts: buffer context and global context. The global context can be built with tracers and reflection, and there are no plans to make it part of Elixir. We are always happy to add more traces and reflections though. The buffer context could be made part of Elixir, but right now it should not be, as doing so will only slow down progress. Instead, we are adding functions so you depend less on Elixir internals. If we make this part of Elixir in the future, we would probably make it so it receives Elixir AST as input, so it works both with Elixir's parser and external fault-tolerant parsers as Spitfire. While inclusion in Elixir can be useful, as we would be able to use it in both Livebook and IEx, it is not high priority at the moment, especially because IDEs require much more information. Thank you for all of the excellent discussions. Closing in favor of #13361. |
Hi everyone,
This is as second take on "building environments for code fragments" for IDEs (follow up on #12566). The goal of this issue is to propose a rich data structure for tracking fragment/buffers. Let's start with some background.
Status quo
Currently,
Code.Fragment
contains functions that analyze where we are in a code snippet:cursor_context
tells us where we are in a tokencontainer_cursor_to_quoted
tells us where we are in the AST (improved in Elixir v1.15 to provide additional context)So while Elixir can tell that you are inside a
alias Enumera
call or that you are in the middle of typingCode.Fr
, it does not provide any mechanism for telling you thatCode.Fragment
is a valid autocompletion forCode.Fr
.In order for us to provide such completions, we need to consider three environments:
The fragment/buffer environment. This is a file in your editor, a cell in Livebook, or a prompt in IEx. This file may or may not be valid Elixir.
The eval environment. This comes from previous cells in Livebook and previously executed code in IEx. AFAIK, it does not exist in IDEs.
The global environment. This is all modules, dependencies, etc. Exactly how those modules are compiled and searchable is a separate problem.
The goal of this issue is to build infrastructure to solve the fragment/buffer environment.
Fragment/buffer environment
Take the following code (think of text in your editor):
Our goal is to know what is the fragment/buffer environment so we can use it for completion and other features. For example: what has been imported from Baz that is available inside
bat
? What are the local functions visible frombat
? And so on.After talking with @michalmuskala, we want to approach this problem using the following techniques:
First the whole file is loaded as a fragment/buffer
We will process the fragment using a robust parser that will be capable of ignoring incomplete expressions, such as
{
in the example aboveOnce the fragment is processed, we will build a tree of scopes. For example, both
bat
andlocal_function
, each delimited by theirdo/end
blocks, depend on the scope fromdefmodule Foo do
As the user changes those files, we will replicate those diffs on top of the fragment. For example, if a line is added, then we add said line to the buffer in a way we can reuse scopes and minimize additional computations
Therefore, the goal is to always be able to quickly answer which functions, variables, imports, aliases, are available at any point in a fragment, ultimately reducing latency in IDE features. I believe this will allow us to replace
elixirsense
in a way that is robust and performant.The tasks above are major efforts in themselves, so we will need further issues to break them down, but this issue introduces a general direction.
/cc @scohen @scottming @lukaszsamson @mhanberg
The text was updated successfully, but these errors were encountered: