Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.19.3 (Insiders build) Performance Enhancement Experiment: Go To Symbol #11557

Open
fearthecowboy opened this issue Oct 19, 2023 · 9 comments
Open

Comments

@fearthecowboy
Copy link
Member

fearthecowboy commented Oct 19, 2023

1.19.3-Insiders Performance Enhancement Experiment: Go To Symbol

With the 1.19.3 release of the C/C++ extension, significant changes have been made to the 'Go To symbol in the workspace'
(this addresses several issues including #4934 #7908 #7914)

This implementation is the result of an extensive deep dive investigation that I did into the performance of VSCode,
and crafting a brand new design for implementing features such as this. As such, it is an experimental feature, and
we are looking for some serious feedback on its performance and accuracy.

Call to Action

We're looking for feedback on the new experimental implementation of 'Go To symbol in the workspace' (ctrl-T) for VSCode,
both positive and otherwise - If you're able to test the new implementation and provide feedback, it would be greatly
appreciated.

Note: When you upgrade to 1.19.3 it the IntelliSense browse database will be rebuilt, which may take a few minutes on large projects.

When upgrading to the 1.19.3 insiders release, you may randomly be assigned to be either in the experiment group
(using the enhancement) or in the control group (no enhancement).

If you are not in the experiment group, you can explicitly opt-in the experiment group by adding the following setting
into your settings.json file (either globally or in the workspace):

    "C_Cpp.experimentalFeatures": "enabled",

Note: Setting C_Cpp.experimentalFeatures to disabled will opt you out of the experiment group.

Once you have the setting in place, you can test the new implementation by using the 'GoTo symbol' in VSCode (Ctrl-T)

Any feedback that you can provide would be greatly appreciated. Feel free to post comments in this thread with any
experience you wish to share.

Feedback

If you have feedback (positive or otherwise) on the new implementation, please post it in this thread.
We are looking for feedback on the following:

  • Performance - does the search feel sufficiently fast?

    • if it is not as fast as would expect, ,
  • Quality - Are you getting to the symbol you're looking for easily?

  • When giving feedback - the more details you can give, the better we can hone the results.

    • some details about the hardware you are using (OS/CPU/RAM/DISK)
    • the size of the workspace you are searching in (size on disk, total number of source files)
    • if you can provide a reproducible example, that would be very helpful. (ie, github repo, and the symbol you're looking for, and the search criteria you're using)

Details of the new implementation

The new implementation of 'GoTo symbol in the workspace' (ctrl-T) for VSCode uses an entirely new algorithm for searching
for symbols in the workspace. It is using a full-text-search index of symbols that is maintained on the fly, which
allows us to quickly find symbols using a variety of search methods.

The search is handled through several different queries, starting with finding very literal matches, and progressively
moving to very fuzzy matches. The result is that it should return very relevant results in a fraction of the time that
it was previously.

General search behavior

  • VSCode orders the results by its own relevance algorithm, so the most relevant results should be at the top.

  • VSCode filters the search results to only show results where all the characters in the input are found in the fully
    qualified symbol name, and in the order they are specified, so a search with teh will not return the, but th
    will. Generally, the more characters that are specified, the more narrow the results should be.

  • The maximum number of symbols returned from a search is 10000 - this is a reasonable limitation, both in order
    to keep the number of results to a useful maximum, and to not have it take excessively long to return results.

  • Symbol matching is generally case-insensitive, so should work regardless of the case of the input, but if there are
    too many results because of fuzzy matching, it will tend to be more accurate when casing matches the expected symbol.

  • Symbols can now be searched for in a specific scope (class or namespace) using :: in the input. For example,
    foo::bar will search for symbols named bar in the scope of foo. This is useful for finding symbols that have
    common names, but are in different scopes. For example, foo::bar will find bar in foo, but not bar in baz.

  • The scope itself can be searched, so foo:: will find symbols in any scope containing foo -- foo::bar,
    bar::foo::baz, foo_bar::baz, etc. This is useful for finding all symbols in a specific namespace. In a large
    workspace, this may return a large number of results.

The following kinds of searches are performed in order:

Direct matches

Searches for symbols that match the input, or where the symbol has words that start with the input.

  • foo - will match foo and fooBar, bar_foo, bar_foo_baz
  • fooBar - will match fooBar and fooBarBaz, bar_fooBarz

Substring search

Searches for symbols that contain the input as a substring anywhere in the symbol name.

  • foo will match tofoo

Abbreviations or word searches

Searches for symbols that match the input as an abbreviation of a given, or contains the words in the input.

  • fooBar will match fooBar, foo_bar, bizFooBar, and biz_foo_bar
  • fb will match fooBar and foo_bar
  • dsmc will match doSomethingMoreComplicated as well as do_something_more_complicated
  • Scoped searches like fb::dsmc will match fooBar::doSomethingMoreComplicated

Fuzzy searching

Searching with letters that are in the symbol name in the order they appear, but not necessarily adjacent. This starts
with closer matches and progressively gets fuzzier, but will stop searching when it reaches a threshold of time.

  • vbs will match averybigsymbol
  • bip::vbs will match biginformationscope::averybigsymbol
@fearthecowboy fearthecowboy self-assigned this Oct 19, 2023
@bobbrow bobbrow added this to the On Deck milestone Oct 19, 2023
@sean-mcmanus sean-mcmanus modified the milestones: On Deck, 1.19 Nov 7, 2023
@fearthecowboy fearthecowboy changed the title Investigation: Performance issues 1.19.3-Insiders Performance Enhancement Experiment: Go To Symbol Feb 13, 2024
@fearthecowboy fearthecowboy pinned this issue Feb 13, 2024
@fearthecowboy fearthecowboy changed the title 1.19.3-Insiders Performance Enhancement Experiment: Go To Symbol 1.19.3 (Insiders build) Performance Enhancement Experiment: Go To Symbol Feb 13, 2024
@Shaka0723
Copy link

Shaka0723 commented Feb 19, 2024

  1. Cpptools has been updated to 1.19.3
    image

  2. "C_Cpp.experimentalFeatures": "enabled",
    

Test Result:
Seems not any improvement as before.
I don't know what wrong is.
repo:
https://github.com/Shaka0723/cpptools_fuzzysearchTest

screen recording:
fuzzy

@fearthecowboy
Copy link
Member Author

@Shaka0723 - So, what's happening here is that the direct matches are picking up matches (so, usb is matching), but usbgain doesn't because the fuzzy searching is somewhat limited on how close the characters need to be (for usbgain to match UsbAudioSendSpeakerToVolumeGain the range (which currently is capped to ~16 characters) would have to be increased significantly, which would take longer on very large workspaces.

if you search for usbGain (where the case changes gives the algorithm something to split words on) you should be able to find that symbol.

Given that usb and usbvol initially matched your symbol early on, I'm curious if you see this as a significant productivity gap, or is this a more of an extreme example?

@eclazi
Copy link

eclazi commented Feb 20, 2024

Performance wise, seems much better to me 🎆 . Actually usable now. Its about 1s delay before seeing matches in a million line C++ codebase, whereas before it was maybe 30s to a minute.

Quality of results - making sense to me.

@Shaka0723
Copy link

Shaka0723 commented Feb 21, 2024

@Shaka0723 - So, what's happening here is that the direct matches are picking up matches (so, usb is matching), but usbgain doesn't because the fuzzy searching is somewhat limited on how close the characters need to be (for usbgain to match UsbAudioSendSpeakerToVolumeGain the range (which currently is capped to ~16 characters) would have to be increased significantly, which would take longer on very large workspaces.

if you search for usbGain (where the case changes gives the algorithm something to split words on) you should be able to find that symbol.

Given that usb and usbvol initially matched your symbol early on, I'm curious if you see this as a significant productivity gap, or is this a more of an extreme example?

I don't think this is a extreme example.
it works very fine if I search in file symbol - via inputting usbvol usbVol usbgain usbGain - it's case-insensitive.

also, when I input dsmc, doSomethingMoreComplicated fileted but do_something_more_complicated not. - quite strange.

Simply speaking, I am expecting the performance and experience of the search algorithm to be the same as for symbol searches in files - case-insensitive, fuzzy.

@Shaka0723
Copy link

Shaka0723 commented Feb 22, 2024

one more word, can let user decide the characters number range but not mandatorily set 16 max by cpptools?
we are not always in the scenario of using large workspace.

the range (which currently is capped to ~16 characters)

@sean-mcmanus
Copy link
Contributor

sean-mcmanus commented Feb 23, 2024

@fearthecowboy Should we (or the user) file a feature request to add a setting for the max fuzzy character distance? But if that setting were added, it seems like we would also need to add a setting for the fuzzy search timeout, otherwise setting the distance too high could just result in the timeout getting hit, resulting in no fuzzy symbols still.

@chall1123
Copy link

I would like to ask if there will be support for fuzzy searches exceeding 16 characters in the future,or this plugin will just support 0~16 characters

@sean-mcmanus
Copy link
Contributor

@likui1123 With https://github.com/microsoft/vscode-cpptools/releases/tag/v1.20.0 , i.e. the fuzzy character limit has increased to 28.

@Shaka0723
Copy link

Shaka0723 commented Apr 8, 2024

@likui1123 With https://github.com/microsoft/vscode-cpptools/releases/tag/v1.20.0 , i.e. the fuzzy character limit has increased to 28.

I tried 1.20.0, it has been better than before.

but why not let user decide fuzzy character limit?
eg, let 28 be default value but not a fixed one, and limit should be <100

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants