Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workspaceContains starts a search over full workspace, including .git/, node_modules/ #34711

Closed
roblourens opened this issue Sep 20, 2017 · 42 comments
Assignees
Labels
extension-host Extension host issues feature-request Request for new features or functionality perf perf-startup release-notes Release notes issues verification-needed Verification of issue is requested verified Verification succeeded

Comments

@roblourens
Copy link
Member

#34487 made one improvement by batching workspaceContains searches.

Keeping this one open to consider excluding .git and node_modules in a smart way, which could give a huge improvement.

cc @jrieken @alexandrudima

We should include the exclude settings, or hardcode an exclusion to a few folders.

So, how do you then active on .git/HEAD?

Maybe add some smartness to search .git/ or node_modules/ if it's explicitly mentioned in the workspaceContains: search glob?

@alexdima
Copy link
Member

I suggest we do the following:

Analyse how many extensions in the marketplace use workspaceContains glob patterns and take a look if anyone of them would be negatively impacted if we would include the search excludes in there.

@sandy081 is looking into some tooling for such use cases.

@roblourens
Copy link
Member Author

roblourens commented Nov 7, 2017

@sandy081 can vscode-tools help us analyze this yet?

It would really help if a workspaceContains search would skip files.exclude. But in theory, extensions that activate on SCM files, particular node_modules being present, etc are valid use cases, so I'm curious whether extensions like that actually exist.

We also have #33528 to look at this for findFiles

@sandy081
Copy link
Member

sandy081 commented Nov 7, 2017

@roblourens Yes. It supports querying on activationEvent by passing a regEx value. For example

activationEvents:`^workspaceContains.*\\*.csx\`

will give you extensions activating on workspace containing .csx files.

@roblourens
Copy link
Member Author

roblourens commented Nov 8, 2017

Thanks! I only see one that depends on **/node_modules. A few others look in node_modules, but with no glob patterns, so it can be done without a search. But I guess it should be consistent whether glob patterns are present or not. One is actually mine, which just activates when a ./node_modules folder is present at all. I don't see any looking at .git/

@jens1o
Copy link
Contributor

jens1o commented Nov 21, 2017

Is this considered for November 2017? This would be important for me.

@sandy081 sandy081 removed their assignment Nov 24, 2017
@roblourens roblourens added perf-startup under-discussion Issue is under discussion for relevance, priority, approach labels Nov 26, 2017
@chrmarti
Copy link
Contributor

I wonder if there is a general way to limit the amount of resources workspaceContains consumes. While useful for medium sized workspaces, it currently is too expensive for large ones.

One option might be to stop the search after some time (e.g., 10s) and treat it as no match (to avoid grabbing more resources).

@alexdima
Copy link
Member

@chrmarti I like your suggestion.

I would also add, perhaps there is a depth setting we could use, i.e. search 4 folder levels in, and not deeper, etc.

@chrmarti
Copy link
Contributor

There is a max depth parameter for Ripgrep. The downside of that would be that it would break scenarios with few files in a deep folder structure. In other cases it might not change much, because already 4 levels can be too many files.

Maybe using file.excludes and search.exclude for filtering and a 10 seconds limit would improve things by using the knowledge in the settings about what is of interest while also having a fallback if that is too slow.

We then need to document that workspaceContains should not be relied on exclusively, which is fair considering how much resources it can take.

@roblourens
Copy link
Member Author

roblourens commented Mar 7, 2018

Yes this is exactly what I want. If you are using the C# extension, and for some reason we can't find a single csproj or .cs file in 10 seconds, then you'll probably open a csharp file at some point (activating the extension by onLanguage) or run a csharp command. If none of this, you probably won't benefit from the extension.

We could also check opened files against the workspaceContains patterns, I don't think we do that currently.

@alexdima alexdima added feature-request Request for new features or functionality and removed under-discussion Issue is under discussion for relevance, priority, approach labels Apr 20, 2018
@Thaina
Copy link

Thaina commented May 6, 2018

@chrmarti Maybe opposite of that, I mean let we have search.include instead

@chrmarti
Copy link
Contributor

chrmarti commented May 7, 2018

@Thaina search.include wouldn't have sensible defaults, so it would only be of help here if the user configured it which is asking too much from the user for fixing this issue. search.include might make sense on its own, you could open a new issue to track this as a feature request.

@garthk
Copy link

garthk commented Aug 14, 2018

If you can't limit the scope of the search cache or persist it to disk, please add a setting to disable the cache entirely. #34487 is not an “improvement” in a repo with half a million files after npm install. Rather: to solve a problem I didn't have, it's burning my battery for five minutes while chewing 2GB RAM each time the language server misses a change and forces me to reload.

@roblourens
Copy link
Member Author

roblourens commented Aug 22, 2018

Simple example of the above: #57047

Applying the exclude patterns is easier with #57046

I don't know whether we still want to do this for August this late? We could add some telemetry around how long these searches are taking and use that to help decide what the timeout value should actually be.

@roblourens
Copy link
Member Author

I think with #57046 we'll get that telemetry anyway

@chrmarti
Copy link
Contributor

Just to add to the overall picture: @reggiew2, the reporter of #55649, saw the issue when opening the home folder and the reported hardware looks similar to mine, so I ran the following measurements with (the default) and without following symlinks on my home folder (hoping that would have a somewhat similar structure to the reported one):

time (rg --files --follow 2>/dev/null | wc -l)
14974573 files in 81.09 seconds

time (rg --files 2>/dev/null | wc -l)
1474396 files in 10.538 seconds

Not following symlinks ("search.followSymlinks": false) might well help in this case.

@chrmarti
Copy link
Contributor

I forgot: Ripgrep's default is to use .gitignore files, so here are the same measurements with that turned off which is what workspaceContains does:

time (rg --files --no-ignore --follow 2>/dev/null | wc -l)
17349358 files in 83.72 seconds

time (rg --files --no-ignore 2>/dev/null | wc -l)
2225053 files in 28.880 seconds

@roblourens
Copy link
Member Author

Yeah we can use the configured values of those settings to follow what someone said above, if normal search can find it, workspaceContains will find it.

@roblourens
Copy link
Member Author

I think we should put this in the first week of September so it has plenty of time in Insiders, unless anyone has any objection.

roblourens added a commit that referenced this issue Sep 5, 2018
#34711 - run search for 5s, if it times out, activate the extension anyway
@roblourens
Copy link
Member Author

roblourens commented Sep 5, 2018

Fixed in #57047

Should use the user's excludes, and configured "use ignore files" and "follow symlinks" settings, and searches are limited to 5s, after which the extension is activated anyway.

The possible downside is that you activate extensions which actually shouldn't be activated. Hopefully this will be ok.

@roblourens roblourens added the verification-needed Verification of issue is requested label Sep 5, 2018
@alexdima alexdima removed their assignment Sep 6, 2018
@roblourens roblourens added the release-notes Release notes issues label Sep 14, 2018
roblourens added a commit that referenced this issue Sep 15, 2018
Capture workspaceContainsTimeout numbers
@roblourens
Copy link
Member Author

I've noticed extensions being activated from timeout when my macbook is doing a lot. If I set "window.restoreWindows": "all" and reopen vscode with several windows, then sometimes one will have an extension activated from timeout.

I added telemetry so we can see how much this happens in the wild

@roblourens
Copy link
Member Author

roblourens commented Sep 17, 2018

Besides what we can see in telemetry, I want to get an idea of when this will happen. I published an extension that does nothing but show a notification when it's activated, and is activated on a workspaceContains pattern that doesn't exist. If anyone wants to help out, you can install this extension https://marketplace.visualstudio.com/items?itemName=roblourens.vscode-workspacecontains-canary and take note when you see the notification.

If you open a very large workspace, or many windows all at once, or something like that, then you will see the notification and that is expected and by design. But if you see it when it's not expected, then you can take note of what else your computer was doing at the time that might have slowed search down. Then let me know if you see it happening at random times, and if so, we may want to raise the limit by a couple seconds, or tweak it somehow.

For example, I was thinking that instead of measuring 5s in extensionHostMain where we look at the workspaceContains pattern, we might want to include a timeout as a search option and measure closer to where the search is actually performed. Not across the extension API boundary but if we measure at the point where a search provider is invoked, that will avoid a couple IPC hops that could be blocked when the computer is very busy.

@octref
Copy link
Contributor

octref commented Sep 26, 2018

From my understanding, this is how it's supposed to work:

  • I create an extension that uses this activation pattern "workspaceContains:**/node_modules/express/package.json"
  • When I have **/node_modules excluded in files.exclude, the extension should not activate. However, in testing, this is not the case.
  • When I have node_modules specified in .gitignore and "search.useIgnoreFiles": true, the extension should not activate. However, in testing, this is not the case.

@octref octref reopened this Sep 26, 2018
@octref octref closed this as completed Sep 26, 2018
@octref octref added the verified Verification succeeded label Sep 26, 2018
@chpxu
Copy link

chpxu commented Oct 21, 2018

When I use Ripgrep, I get roughly 4 - 5 individual rg.exe in Task Manager,each roughly taking 5 - 15% CPU (Making my usage go to a constant 77% and slowing my PC heavily) and between 5 - 30 MB RAM each

VSCode 1.29.0-exploration (observed this in insiders)
Windows 10 1803 17134.345

@roblourens
Copy link
Member Author

Can you open a new issue with more details, because it's not clear what you mean and probably isn't relevant to this issue

@vscodebot vscodebot bot locked and limited conversation to collaborators Nov 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
extension-host Extension host issues feature-request Request for new features or functionality perf perf-startup release-notes Release notes issues verification-needed Verification of issue is requested verified Verification succeeded
Projects
None yet
Development

No branches or pull requests

9 participants