Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize handling of globs from extension searches #75314

Closed
lubo3395 opened this issue Jun 12, 2019 · 21 comments
Closed

Optimize handling of globs from extension searches #75314

lubo3395 opened this issue Jun 12, 2019 · 21 comments
Assignees
Labels
api feature-request Request for new features or functionality search Search widget and operation issues
Milestone

Comments

@lubo3395
Copy link

Issue Type: Performance Issue

Opening project folders with vscode slows down

VS Code version: Code 1.35.0 (553cfb2, 2019-06-04T01:17:12.481Z)
OS version: Windows_NT x64 10.0.17763

System Info
Item Value
CPUs Intel(R) Core(TM) i5-7500 CPU @ 3.40GHz (4 x 3408)
GPU Status 2d_canvas: enabled
checker_imaging: disabled_off
flash_3d: enabled
flash_stage3d: enabled
flash_stage3d_baseline: enabled
gpu_compositing: enabled
multiple_raster_threads: enabled_on
native_gpu_memory_buffers: disabled_software
rasterization: enabled
surface_synchronization: enabled_on
video_decode: enabled
webgl: enabled
webgl2: enabled
Load (avg) undefined
Memory (System) 15.89GB (6.99GB free)
Process Argv C:\Users\jedi\code\Jedi\OBG_clint
Screen Reader no
VM 0%
Process Info
CPU %	Mem MB	   PID	Process
    4	    96	 13912	code main
    1	    68	    76	   window (问题报告程序)
    0	    58	  1076	   shared-process
    0	   177	  1564	   window (Login.vue - OBG_clint - Visual Studio Code)
    0	    10	  6628	     electron-crash-reporter
    0	    11	  6976	     watcherService 
    0	     5	 16980	       console-window-host (Windows internal process)
    0	    38	 13172	     extensionHost
    0	   105	  2960	       electron_node vueServerMain.js 
    0	    24	  6472	       electron_node tsserver.js 
    0	    17	 12056	         electron_node typingsInstaller.js typesMap.js 
    0	   174	  7732	   window (main.js - demo - Visual Studio Code)
    0	    11	  2444	     watcherService 
    0	    11	  7552	       console-window-host (Windows internal process)
    0	    62	 14956	     extensionHost
    0	   117	 15816	       electron_node tsserver.js 
    0	    56	  5284	         electron_node typingsInstaller.js typesMap.js 
    0	   161	 10872	   gpu-process
Workspace Info
|  Window (Login.vue - OBG_clint - Visual Studio Code)
|  Window (main.js - demo - Visual Studio Code)
|    Folder (demo): 14 files
|      File types: js(4) json(2) vue(2) browserslistrc(1) gitignore(1) md(1)
|                  html(1) ico(1) png(1)
|      Conf files: package.json(1)
|    Folder (OBG_clint): 349 files
|      File types: js(149) vue(83) css(49) less(16) html(10) svg(9) jpg(9)
|                  png(5) json(3) ico(2)
|      Conf files: package.json(1);
Extensions (8)
Extension Author (truncated) Version
laravel-blade-spacer aus 1.0.3
vscode-intelephense-client bme 1.0.14
dotenv mik 1.0.1
vscode-language-pack-zh-hans MS- 1.35.1
mssql ms- 1.6.0
vetur oct 0.21.0
laravel-blade one 1.20.0
open-html-in-browser pea 1.2.0
@oliversalzburg
Copy link

Seems to me like node_modules are again subject to the ripgrep treatment. To work around this issue, you can put node_modules into the files.exclude setting.

@roblourens
Copy link
Member

If node_modules are not in files.exclude or .gitignore, they will be searched. Please give more details.

@roblourens roblourens added the info-needed Issue requires more information from poster label Jun 12, 2019
@oliversalzburg
Copy link

@roblourens If that is how it's supposed to work (and I think it is) then there is a regression. .gitignore has no effect, only files.exclude.

The node_modules folder is in the .gitignore of every single project in the workspace. An example for a generated command line for one of the projects in the workspace is:

"c:\Users\oliver\AppData\Local\Programs\Microsoft VS Code\resources\app\node_modules.asar.unpacked\vscode-ripgrep\bin\rg.exe" --files --hidden --case-sensitive -g /test/**/*.js -g !**/.git -g !**/.svn -g !**/.hg -g !**/CVS -g !**/.DS_Store -g !/.nyc_output -g !/mochawesome-report --no-ignore --follow --no-config --no-ignore-global

When I open the workspace, the multiple rg.exe processes fully saturate the CPU on my workstation for considerable amount of time and the system becomes unusable.

When I include node_modules in the files.exclude setting of the workspace, the rg.exe processes are extremely short-lived and there is no problem at all. The generated command line is:

"c:\Users\oliver\AppData\Local\Programs\Microsoft VS Code\resources\app\node_modules.asar.unpacked\vscode-ripgrep\bin\rg.exe" --files --hidden --case-sensitive -g /test/**/*.js -g !**/.git -g !**/.svn -g !**/.hg -g !**/CVS -g !**/.DS_Store -g !/.nyc_output -g !/mochawesome-report -g !/node_modules --no-ignore --follow --no-config --no-ignore-global

@roblourens
Copy link
Member

It looks like you have the setting search.useIgnoreFiles disabled, can you check that? That causes the --no-ignore to be used which disables gitignore

@oliversalzburg
Copy link

search.useIgnoreFiles is not configured in my user or workspace settings. The toggle in the sidebar is enabled:

image

@roblourens
Copy link
Member

Can you find the full search log?

Please do the following

  • Run the command "Set log level" and change it to "Trace"
  • Run the search again
  • Show the search-related messages in the "Log (Window)" output panel

image

@oliversalzburg
Copy link

oliversalzburg commented Jun 17, 2019

That panel is empty even after changing the log level. I'm not running any searches. I'm just opening a workspace in VS Code.

@roblourens
Copy link
Member

There are extensions triggering searches or being activated on a glob pattern which also triggers a search, if you change the log level and reload the window, you should see them. I want to see if there are any other hints about why --no-ignore is being used.

@oliversalzburg
Copy link

oliversalzburg commented Jun 17, 2019

@roblourens I took the liberty of sending the log to your gmail.

I was pretty confident that I verified this with extensions disabled, but was unable to when I tried it now :(

@roblourens
Copy link
Member

Thanks, I see what's going on, I forgot that .gitignore and search.exclude are not considered for extension-triggered searches. It has always been this way, changing it now would just be a breaking change, and some extensions rely on this. #68056

So unfortunately all I can say is that you can add these folders to files.exclude to exclude them from the extension's search. I'm guessing this is an extension related to tests based on the fact that it is looking for files matching /test/**/*.js.

@oliversalzburg
Copy link

Yeah, I figure it's https://github.com/hbenl/vscode-mocha-test-adapter I've been using this for some time now. I'm surprised this behavior is so noticeable now but has been unchanged from the last update :/

Including node_modules in files.exclude is undesirable as having the folder available in VS Code makes debugging a lot easier in many of our projects. So this is disappointing.

But I'll look into possible ways to work around the problematic behavior via the configuration options of the extension, as there shouldn't really be any reason to look into ./node_modules when the test directory is ./test anyway.

Thanks for the support.

@roblourens
Copy link
Member

Looking at the code, it looks like the extension makes several other checks before triggering a search. You might check whether one of those can help you to avoid the search entirely. https://github.com/hbenl/vscode-mocha-test-adapter/blob/40a8b1beb4f1f9790efb20f87aeb52b766a84408/src/configReader.ts#L180

Or if this is a workspace where you are not using this extension, extensions can be disabled per-workspace. Alternately it could be a bug with the extension.

@roblourens
Copy link
Member

We handle the ripgrep issue that you identified in some cases - for text searches that use the ./foo/bar syntax in "files to include" to search with a particular pattern from the root, we map this to -g '!*' -g 'foo' -g 'foo/bar' -g 'foo/bar/**'.

For a search like the one in this case, I can in theory apply the same optimization. It's often really tricky to get this right without breaking something but I'm all for anything that decreases the impact of extension searches.

@roblourens roblourens reopened this Jun 17, 2019
@roblourens roblourens changed the title Performance issues (1.35.0) Optimize handling of globs from extension searches Jun 17, 2019
@roblourens roblourens added feature-request Request for new features or functionality search Search widget and operation issues and removed info-needed Issue requires more information from poster labels Jun 17, 2019
@oliversalzburg
Copy link

IMHO the entire issues is a result from trying to abstract away too much of the complexity of search.

Sometimes you want the convenience of not including files matched by .gitignore (and the likes), sometimes you do not. Sometimes you want to follow all symlinks, sometimes you don't. Sometimes you want to find files ignoring case, sometimes not. Sometimes you care about the users settings, sometimes not so much.

As I understand it, the API in VS Code currently allows only 2 relevant arguments to target a search: One inclusive pattern and one exclusive pattern. The remaining ways to further specify the search are set outside of the scope of the search request, which should already be undesirable in itself. The result of the search must then be filtered down further as desired.

This is not ideal. First a tool, that prides itself on performance (rg), is used to collect a large list only to then filter it down far less sufficiently, wasting a lot of resources and time. Even though rg could have been used in a more targeted fashion right from the beginning.

Another aspect is, rg uses parallelization already. Yet, one process is started for every project in the workspace. Whenever there is a new issue resulting from how rg is employed in VS Code, it slows down the machine to a grinding halt. These searches should be queued and run in series.

In any case, thanks again for your help, which ultimately resulted in a solution implemented by the extension author, resolving the problem in this specific case. And I also learned a bit more about ripgrep :)

@roblourens
Copy link
Member

The remaining ways to further specify the search are set outside of the scope of the search request, which should already be undesirable in itself. The result of the search must then be filtered down further as desired.

I do have an open issue to expose more options to the findFiles extension API but that extension wouldn't want to exclude gitignore and search.excude files all the time, that was a problem in this issue: #68056. But yeah extensions should be able to use that in their searches.

Yet, one process is started for every project in the workspace. Whenever there is a new issue resulting from how rg is employed in VS Code, it slows down the machine to a grinding halt. These searches should be queued and run in series.

Not sure, I think rg will use all the resources available, whether there is one process at a time or 5. That is not a bad thing, you paid for a fast CPU, let's use it :)

Thanks for following up with the extension author, I think just excluding node_modules is a good workaround too.

@oliversalzburg
Copy link

oliversalzburg commented Jun 19, 2019

Not sure, I think rg will use all the resources available, whether there is one process at a time or 5. That is not a bad thing, you paid for a fast CPU, let's use it :)

I'd rather trust a single instance of rg to balance resource usage, then running N instances without any control over resource usage. The CPU is not the only resource being used here and the worst-case has the potential to completely freeze the system.

I have been involved or followed many tickets revolving around performance/freeze issues in VS Code and rg (or the way it is utilized) was very often the source of it. I also know you have been involved in those tickets as well. I'm sure you are aware of the potential problems that result from the current approach.

When I run a single rg --files --follow --no-ignore in a problematic directory, my CPU load for rg peaks at 60% and uses over 2 GB of memory.

image

I manually exited rg at the point I took the screenshot. It would have probably run for a while longer. The graph was plotted over several minutes of run time.

This is only a single process. If 10 of those would be running for my entire workspace (and this is what was happening when I initially posted in this ticket), the system becomes unusable. I don't even want to think about what would happen if my system would have started to page out memory to disk.

In the best-case scenario, people (or the extensions they are using) aren't searching for millions or billions of files and this use case should be perfectly served by single instances of rg running in series. Especially since, as you've also pointed out, rg already parallelizes excellently.

In the worst-case scenario, people should at least be able to still control their system and not be locked out by processes running wild due to an error.

Maybe running in series isn't even a good solution, I just experienced this so often now, that I felt maybe a better solution can be found and implemented.

@roblourens roblourens added this to the Backlog milestone Oct 27, 2019
@gluxon
Copy link

gluxon commented Feb 11, 2022

Hey @roblourens, hoping to get your help on this issue again in light of your recent fix in #138337 (comment).

It has always been this way, changing it now would just be a breaking change, and some extensions rely on this. #68056

I agree with the complexities of a fix here being a breaking change, and it was the main reason I hadn't commented before now. Noticing your other fix, would adding a default timeout here too be reasonable? I personally don't see many good reasons for extensions to search for files in the background for more than a few minutes, but it's possible I'm missing hidden complexity here. What are your thoughts?

The problem was exacerbated after we switched to pnpm, which sets up a symlinked strict node_modules structure that makes it look larger than it is to a lot of tooling. Setting "search.followSymlinks": false helps, but still leads to 100% CPU rg processes running for 5+ minutes.

It looks like the change to respect 7s timeouts in the November 2021 release only applies to filesService.ts. I was a bit excited when I saw that commit, but after tracing down vscode.workspace.findFiles's call stack, it looks like that's calling into a separate searchService.ts file.

private doSearch(query: ISearchQuery, token?: CancellationToken, onProgress?: (item: ISearchProgressItem) => void): Promise<ISearchComplete> {


For context, many members of my team have been running into this problem from extensions that fire off vscode.workspace.findFiles(...) requests.

We first saw this in rebornix/vscode-project-snippet, but considered it a one-off misbehaving extension since it's deprecated.

But since then we've isolated other extensions causing this problem. Ex:

Adding node_modules to files.exclude does resolve this problem, but we're hesitant to set this as the default in our repo's .vscode/settings.json for the same reasons @oliversalzburg mentioned above.

@roblourens
Copy link
Member

Setting "search.followSymlinks": false helps, but still leads to 100% CPU rg processes running for 5+ minutes.

And this is because it's still spending time in node_modules?

To be clear, this particular "optimize globs" issue wouldn't help with extensions searching for **/something because that query has to check every single file in your folder regardless, and they should just carefully consider whether they really have to do that...

Thanks for sending a PR upstream to that extension.

@gluxon
Copy link

gluxon commented Feb 15, 2022

And this is because it's still spending time in node_modules?

We believe so. In the rebornix/vscode-project-snippet case my coworker ran dtruss and saw a bunch of read calls into node_modules/.... Adding node_modules to files.exclude stopped the high CPU usage, which supports the theory.

To be clear, this particular "optimize globs" issue wouldn't help with extensions searching for **/something because that query has to check every single file in your folder regardless, and they should just carefully consider whether they really have to do that...

You're right, but I think it's also the case that the existing API makes it easy to accidentally search everything when **/something is the query?

I'm reading a bit more closely at the comments above and noticed that an update to the API was already considered:

I do have an open issue to expose more options to the findFiles extension API

Is the result of that this comment? #48674 (comment) I see the new API allows useSearchExcludes and useIgnoreFiles, which would definitely be more reasonable to send PRs against extensions. 🙂

It seems like migrating to whatever new function accepts FindInFilesOptions is the best way to close this issue. It requires a change in all extensions exhibiting this problem, but I think that's okay.

Thanks for sending a PR upstream to that extension.

Thanks for the suggestion that was a reasonable way to fix this. I would have felt unsure creating such a PR before reading your comment above.

@roblourens
Copy link
Member

Yeah, new API will help here, I hope to get to it soon.

@eleanorjboyd
Copy link
Member

closing as stale since there have been many API changes that have impacted performance. If you are still having problems with performance please file a new issue so we can address it in its current state. Thanks!

@eleanorjboyd eleanorjboyd closed this as not planned Won't fix, can't repro, duplicate, stale Dec 10, 2024
@vs-code-engineering vs-code-engineering bot locked and limited conversation to collaborators Jan 26, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api feature-request Request for new features or functionality search Search widget and operation issues
Projects
None yet
Development

No branches or pull requests

6 participants