Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve SCIP symbols #18758

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

Conversation

mgsloan
Copy link

@mgsloan mgsloan commented Dec 26, 2024

In particular, the symbol generation before this change creates a lot of symbols with the same name for different definitions. This change makes progress on symbol uniqueness, but does not fix a couple cases where it was unclear to me how to fix (see TODOs in scip.rs)

Before this change SymbolInformation provided by a document was the
info for all encountered symbols that have not yet been emitted. So,
the symbol information on a Document was a mishmash of symbols
defined in the documents, symbols from other documents, and external
symbols.

After this change, the SymbolInformation on documents is just the
locals and defined symbols from the document. All symbols referenced
and not from emitted documents are included in external_symbols.

Other behavior changes:

  • scip command now reports symbol information omitted due to symbol collisions. Iterating with this on a large codebase (Zed!) resulted in the other improvements in this change.

  • Generally fixes providing the path to nested definitions in symbols. Instead of having special cases for a couple limited cases of nesting, implements Definition::enclosing_definition and uses this to walk definitions.

  • Parameter variables are now treated like locals.

    • This fixes a bug where closure captures also received symbols scoped to the containing function. To bring back parameter symbols I would want a way to filter these out, since they can cause symbol collisions.

    • Having symbols for them seems to be intentional in 27e2eea, but no particular use is specified there. For the typical indexing purposes of SCIP I don't see why parameter symbols are useful or sensible, as function parameters are not referencable by anything but position. I can imagine they might be useful in representing diagnostics or something.

  • Inherent impls are now represented as an impl type which takes the self type as a type parameter.

  • Trait impls are now represented as an impl type which takes the self: trait type constraint as a parameter.

  • Associated types in traits and impls are now treated like types instead of type parameters, and so are now suffixed with # instead of wrapped with []. Treating them as type parameters seems to have been intentional in 73d9c77 but it doesn't make sense to me, so changing it.

  • Static variables are now treated as terms instead of Meta, and so receive . suffix instead of :.

  • Attributes are now treated as Meta instead of Macro, and so receive : suffix instead of !.

  • enclosing_symbol is now provided for labels and generic params, which are local symbols.

  • Fixes a bug where presence of ' causes a descriptor name to get double wrapped in backticks, since both fn new_descriptor and scip::symbol::format_symbol have logic for wrapping in backticks. Solution is to simply delete the redundant logic.

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Dec 26, 2024
@mgsloan mgsloan force-pushed the scip-unique-symbols branch 5 times, most recently from 5821914 to bac4f2d Compare December 26, 2024 04:06
In particular, the symbol generation before this change creates a lot
of symbols with the same name for different definitions. This change
makes progress on symbol uniqueness, but does not fix a couple cases
where it was unclear to me how to fix (see TODOs in `scip.rs`)

Behavior changes:

* `scip` command now reports symbol information omitted due to symbol
collisions. Iterating with this on a large codebase (Zed!) resulted in
the other improvements in this change.

* Generally fixes providing the path to nested definitions in
symbols. Instead of having special cases for a couple limited cases of
nesting, implements `Definition::enclosing_definition` and uses this
to walk definitions.

* Parameter variables are now treated like locals.

    - This fixes a bug where closure captures also received symbols
    scoped to the containing function.  To bring back parameter
    symbols I would want a way to filter these out, since they can
    cause symbol collisions.

    - Having symbols for them seems to be intentional in
    27e2eea, but no particular use is
    specified there. For the typical indexing purposes of SCIP I don't see
    why parameter symbols are useful or sensible, as function parameters
    are not referencable by anything but position. I can imagine they
    might be useful in representing diagnostics or something.

* Inherent impls are now represented as `impl#[SelfType]` - a type
named `impl` which takes a single type parameter.

* Trait impls are now represented as `impl#[SelfType][TraitType]` - a
type named `impl` which takes two type parameters.

* Associated types in traits and impls are now treated like types
instead of type parameters, and so are now suffixed with `#` instead
of wrapped with `[]`.  Treating them as type parameters seems to have
been intentional in 73d9c77 but it
doesn't make sense to me, so changing it.

* Static variables are now treated as terms instead of `Meta`, and so
receive `.` suffix instead of `:`.

* Attributes are now treated as `Meta` instead of `Macro`, and so
receive `:` suffix instead of `!`.

* `enclosing_symbol` is now provided for labels and generic params,
which are local symbols.

* Fixes a bug where presence of `'` causes a descriptor name to get
double wrapped in backticks, since both `fn new_descriptor` and
`scip::symbol::format_symbol` have logic for wrapping in
backticks. Solution is to simply delete the redundant logic.

* Deletes a couple tests in moniker.rs because the cases are
adequeately covered in scip.rs and the format for identifiers used in
moniker.rs is clunky with the new representation for trait impls
@mgsloan mgsloan force-pushed the scip-unique-symbols branch 6 times, most recently from 2ebb9df to e1aaa86 Compare December 26, 2024 08:27
Before this change `SymbolInformation` provided by a document was the
info for all encountered symbols that have not yet been emitted. So,
the symbol information on a `Document` was a mishmash of symbols
defined in the documents, symbols from other documents, and external
symbols.

After this change, the `SymbolInformation` on documents is just the
locals and defined symbols from the document.  All symbols referenced
and not from emitted documents are included in `external_symbols`.
@mgsloan mgsloan force-pushed the scip-unique-symbols branch from e1aaa86 to 124c831 Compare December 26, 2024 08:28
I'm fairly sure this is more correct, and saves space(~90mb to 82mb
for Zed's index). I'm checking in about this with SCIP folks in
sourcegraph/scip#299.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants