Avoid indexing the same workspace multiple times #15495

dhruvmanila · 2025-01-15T12:44:22Z

Summary

This is not lazy indexing but it should somewhat help with #13686.

Currently, processing the change notifications for config files doesn't account for the fact that multiple config files could belong to the same workspace. This means that the server will re-index the same workspace n times where n is the number of file events which belongs to the same workspace. This is evident in the following trace logs:

Trace logs:

[Trace - 6:21:15 PM] Sending notification 'workspace/didChangeWatchedFiles'.
Params: {
    "changes": [
        {
            "uri": "file:///Users/dhruv/work/astral/parser-checkouts/home-assistant-core/pylint/ruff.toml",
            "type": 1
        },
        {
            "uri": "file:///Users/dhruv/work/astral/parser-checkouts/home-assistant-core/script/ruff.toml",
            "type": 2
        },
        {
            "uri": "file:///Users/dhruv/work/astral/parser-checkouts/home-assistant-core/script/scaffold/templates/ruff.toml",
            "type": 2
        },
        {
            "uri": "file:///Users/dhruv/work/astral/parser-checkouts/home-assistant-core/pyproject.toml",
            "type": 2
        }
    ]
}

...

[Trace - 6:21:19 PM] Sending notification 'workspace/didChangeWatchedFiles'.
Params: {
    "changes": [
        {
            "uri": "file:///Users/dhruv/work/astral/parser-checkouts/home-assistant-core/tests/testing_config/custom_components/ruff.toml",
            "type": 1
        },
        {
            "uri": "file:///Users/dhruv/work/astral/parser-checkouts/home-assistant-core/tests/ruff.toml",
            "type": 2
        }
    ]
}

...

Server logs:

 14.838004208s TRACE     ruff:main notification{method="workspace/didChangeWatchedFiles"}: ruff_server::server::api: enter
  14.838043583s DEBUG     ruff:main notification{method="workspace/didChangeWatchedFiles"}: ruff_server::session::index::ruff_settings: Indexing settings for workspace: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core
  14.854324541s DEBUG ThreadId(55) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.vscode
  14.854388500s DEBUG ThreadId(55) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.git
  14.937713291s DEBUG     ruff:main notification{method="workspace/didChangeWatchedFiles"}: ruff_server::session::index::ruff_settings: Indexing settings for workspace: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core
  14.954429833s DEBUG ThreadId(75) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.git
  14.954675708s DEBUG ThreadId(66) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.vscode
  15.041465500s DEBUG     ruff:main notification{method="workspace/didChangeWatchedFiles"}: ruff_server::session::index::ruff_settings: Indexing settings for workspace: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core
  15.056731541s DEBUG ThreadId(78) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.vscode
  15.056796833s DEBUG ThreadId(78) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.git
  15.117545833s DEBUG     ruff:main notification{method="workspace/didChangeWatchedFiles"}: ruff_server::session::index::ruff_settings: Indexing settings for workspace: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core
  15.133091666s DEBUG ThreadId(90) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.vscode
  15.133146500s DEBUG ThreadId(90) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.git
  15.220340666s TRACE ruff:worker:6 request{id=5 method="textDocument/diagnostic"}: ruff_server::server::api: enter
  15.220401458s DEBUG ruff:worker:6 request{id=5 method="textDocument/diagnostic"}: ruff_server::resolve: Included path via `include`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/homeassistant/bootstrap.py
  18.577521250s TRACE     ruff:main notification{method="workspace/didChangeWatchedFiles"}: ruff_server::server::api: enter
  18.577561291s DEBUG     ruff:main notification{method="workspace/didChangeWatchedFiles"}: ruff_server::session::index::ruff_settings: Indexing settings for workspace: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core
  18.616564583s DEBUG ThreadId(102) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.vscode
  18.616627291s DEBUG ThreadId(102) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.git
  18.687424250s DEBUG     ruff:main notification{method="workspace/didChangeWatchedFiles"}: ruff_server::session::index::ruff_settings: Indexing settings for workspace: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core
  18.704441416s DEBUG ThreadId(114) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.vscode
  18.704694958s DEBUG ThreadId(121) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.git
  18.769627500s TRACE ruff:worker:4 request{id=6 method="textDocument/diagnostic"}: ruff_server::server::api: enter
  18.769696791s DEBUG ruff:worker:4 request{id=6 method="textDocument/diagnostic"}: ruff_server::resolve: Included path via `include`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/homeassistant/bootstrap.py

This PR updates the logic to consider all the change events at once keeping track of the workspace path that have been already indexed.

I want to include this in tomorrow's release to check how this change would affect the linked issue.

Test Plan

Run the same scenario as above and check the logs to see that the server isn't re-indexing the same workspace multiple times:

Trace logs:

[Trace - 6:04:07 PM] Sending notification 'workspace/didChangeWatchedFiles'.
Params: {
    "changes": [
        {
            "uri": "file:///Users/dhruv/work/astral/parser-checkouts/home-assistant-core/script/ruff.toml",
            "type": 1
        },
        {
            "uri": "file:///Users/dhruv/work/astral/parser-checkouts/home-assistant-core/script/scaffold/templates/ruff.toml",
            "type": 1
        },
        {
            "uri": "file:///Users/dhruv/work/astral/parser-checkouts/home-assistant-core/pylint/ruff.toml",
            "type": 2
        },
        {
            "uri": "file:///Users/dhruv/work/astral/parser-checkouts/home-assistant-core/pyproject.toml",
            "type": 2
        }
    ]
}

...

[Trace - 6:04:11 PM] Sending notification 'workspace/didChangeWatchedFiles'.
Params: {
    "changes": [
        {
            "uri": "file:///Users/dhruv/work/astral/parser-checkouts/home-assistant-core/tests/testing_config/custom_components/ruff.toml",
            "type": 1
        },
        {
            "uri": "file:///Users/dhruv/work/astral/parser-checkouts/home-assistant-core/tests/ruff.toml",
            "type": 2
        }
    ]
}

...

Server logs:

  17.047706750s TRACE     ruff:main notification{method="workspace/didChangeWatchedFiles"}: ruff_server::server::api: enter
  17.047747875s DEBUG     ruff:main notification{method="workspace/didChangeWatchedFiles"}: ruff_server::session::index::ruff_settings: Indexing settings for workspace: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core
  17.080006083s DEBUG ThreadId(54) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.vscode
  17.080085708s DEBUG ThreadId(54) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.git
  17.145328791s TRACE ruff:worker:6 request{id=5 method="textDocument/diagnostic"}: ruff_server::server::api: enter
  17.145386166s DEBUG ruff:worker:6 request{id=5 method="textDocument/diagnostic"}: ruff_server::resolve: Included path via `include`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/homeassistant/bootstrap.py
  20.756845958s TRACE     ruff:main notification{method="workspace/didChangeWatchedFiles"}: ruff_server::server::api: enter
  20.756923375s DEBUG     ruff:main notification{method="workspace/didChangeWatchedFiles"}: ruff_server::session::index::ruff_settings: Indexing settings for workspace: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core
  20.781733916s DEBUG ThreadId(66) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.vscode
  20.781825875s DEBUG ThreadId(75) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.git
  20.848340750s TRACE ruff:worker:7 request{id=6 method="textDocument/diagnostic"}: ruff_server::server::api: enter
  20.848408041s DEBUG ruff:worker:7 request{id=6 method="textDocument/diagnostic"}: ruff_server::resolve: Included path via `include`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/homeassistant/bootstrap.py

MichaReiser · 2025-01-15T12:53:56Z

crates/ruff_server/src/session/index.rs

+    ///
+    /// This method avoids re-indexing the same workspace multiple times if multiple files
+    /// belonging to the same workspace have been changed.
+    pub(super) fn reload_settings(&mut self, changes: &[FileEvent]) {


That makes sense to me. We do something similar in Red Knot where we process all changes and compute what requires updating and only then do the updating.

github-actions · 2025-01-15T13:03:41Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

crates/ruff_server/src/session/index.rs

MichaReiser · 2025-01-15T13:14:22Z

Unrelated to your changes but I noticed it in the tracing logs. I think we could improve the logs by making tracing aware that the walk_dir threads run as part of the request (untested)

Index: crates/ruff_server/src/session/index/ruff_settings.rs
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/crates/ruff_server/src/session/index/ruff_settings.rs b/crates/ruff_server/src/session/index/ruff_settings.rs
--- a/crates/ruff_server/src/session/index/ruff_settings.rs	(revision f20780584daf0b42a641639f3e8da4d67e005dff)
+++ b/crates/ruff_server/src/session/index/ruff_settings.rs	(date 1736946797069)
@@ -211,8 +211,11 @@
         let index = std::sync::RwLock::new(index);
         let has_error = AtomicBool::new(has_error);
 
+        let index_span = tracing::debug_span!("walk_dir");
+
         walker.run(|| {
             Box::new(|result| {
+                let index_span = index_span.clone();
+                let _span = index_span.entered();
                 let Ok(entry) = result else {
                     return WalkState::Continue;
                 };

dhruvmanila · 2025-01-15T13:17:27Z

Unrelated to your changes but I noticed it in the tracing logs. I think we could improve the logs by making tracing aware that the walk_dir threads run as part of the request (untested)

Thank you, will look at it in a follow-up PR.

MichaReiser · 2025-01-15T13:17:43Z

This is great. It would be a much simpler fix than making indexing actually lazy!

dhruvmanila · 2025-01-15T13:28:13Z

This is great. It would be a much simpler fix than making indexing actually lazy!

It's not a complete fix because the client sends multiple didChangeWatchedFiles request as seen in the above trace logs. One of the ways to solve this would be to implement the file watching on the server side and not rely on client side implementation and we would make sure to only re-index the project once.

@MichaReiser I'm interested in hearing your opinion about whether that would be more feasible than lazy indexing. Maybe we could re-use the existing implementation from Red knot?

MichaReiser · 2025-01-15T13:36:21Z

It's not a complete fix because the client sends multiple didChangeWatchedFiles request as seen in the above trace logs.

I don't see how this can be avoided. At least not if the modifications happen too far apart. Red Knot debounces the events to combine events that happen shortly after each other, but that timeout must be small (<300ms), or users experience a lag when making changes in the editor. The server could implement its own debounce logic to debounce the change events coming from the client, but it increases the risk that we respond to requests with an outdated configuration, and it introduces a delay that will become noticeable to users.

Lazy indexing might also not help if the changes happen too far apart because VS code might request updated diagnostics, forcing Ruff to search for the configuration.

Let's see if your change improves the experience. I think it's fairly likely that it will.

dhruvmanila · 2025-01-15T14:45:13Z

The server could implement its own debounce logic to debounce the change events coming from the client, but it increases the risk that we respond to requests with an outdated configuration, and it introduces a delay that will become noticeable to users.

This is an interesting idea. I think the risk is less because at least for pull diagnostics it's only when the server will ask the client to refresh the diagnostics then the client will ask for diagnostics:

ruff/crates/ruff_server/src/server/api/notifications/did_change_watched_files.rs

Lines 25 to 47 in b5dbb2a

    
           if !params.changes.is_empty() { 
        
               if session.resolved_client_capabilities().workspace_refresh { 
        
                   requester 
        
                       .request::<types::request::WorkspaceDiagnosticRefresh>((), |()| Task::nothing()) 
        
                       .with_failure_code(lsp_server::ErrorCode::InternalError)?; 
        
               } else { 
        
                   // publish diagnostics for text documents 
        
                   for url in session.text_document_urls() { 
        
                       let snapshot = session 
        
                           .take_snapshot(url.clone()) 
        
                           .expect("snapshot should be available"); 
        
                       publish_diagnostics_for_document(&snapshot, &notifier)?; 
        
                   } 
        
               } 
        
               // always publish diagnostics for notebook files (since they don't use pull diagnostics) 
        
               for url in session.notebook_document_urls() { 
        
                   let snapshot = session 
        
                       .take_snapshot(url.clone()) 
        
                       .expect("snapshot should be available"); 
        
                   publish_diagnostics_for_document(&snapshot, &notifier)?; 
        
               } 
        
           }

We could experiment with a small amount of debounce period only after which the server will re-index the project and ask the client to refresh the diagnostics / publish the new diagnostics.

MichaReiser · 2025-01-15T14:46:46Z

We could experiment with a small amount of debounce period only after which the server will re-index the project and ask the client to refresh the diagnostics / publish the new diagnostics.

I suspect that VS codes file watcher already does some debouncing but maybe not?

dhruvmanila · 2025-01-15T15:29:15Z

I suspect that VS codes file watcher already does some debouncing but maybe not?

I'll need to look, maybe it does.

dhruvmanila added the server Related to the LSP server label Jan 15, 2025

MichaReiser reviewed Jan 15, 2025

View reviewed changes

Avoid indexing the same workspace multiple times

1ca0631

dhruvmanila force-pushed the dhruv/avoid-indexing-multiple-times branch from 1d9bf50 to 1ca0631 Compare January 15, 2025 12:56

dhruvmanila marked this pull request as ready for review January 15, 2025 12:59

dhruvmanila requested a review from MichaReiser January 15, 2025 12:59

MichaReiser approved these changes Jan 15, 2025

View reviewed changes

crates/ruff_server/src/session/index.rs Show resolved Hide resolved

Clarify doc around path for file events

27f9eb2

dhruvmanila mentioned this pull request Jan 15, 2025

Ruff server pegs CPU and blocks VSCode when changing many files #13686

Open

dhruvmanila merged commit b5dbb2a into main Jan 15, 2025
21 checks passed

dhruvmanila deleted the dhruv/avoid-indexing-multiple-times branch January 15, 2025 13:28

BrewTestBot mentioned this pull request Jan 16, 2025

ruff 0.9.2 Homebrew/homebrew-core#204488

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid indexing the same workspace multiple times #15495

Avoid indexing the same workspace multiple times #15495

dhruvmanila commented Jan 15, 2025 •

edited

Loading

MichaReiser Jan 15, 2025

github-actions bot commented Jan 15, 2025 •

edited

Loading

MichaReiser commented Jan 15, 2025 •

edited

Loading

dhruvmanila commented Jan 15, 2025

MichaReiser commented Jan 15, 2025

dhruvmanila commented Jan 15, 2025

MichaReiser commented Jan 15, 2025

dhruvmanila commented Jan 15, 2025

MichaReiser commented Jan 15, 2025

dhruvmanila commented Jan 15, 2025

Avoid indexing the same workspace multiple times #15495

Avoid indexing the same workspace multiple times #15495

Conversation

dhruvmanila commented Jan 15, 2025 • edited Loading

Summary

Test Plan

MichaReiser Jan 15, 2025

Choose a reason for hiding this comment

github-actions bot commented Jan 15, 2025 • edited Loading

ruff-ecosystem results

Linter (stable)

Linter (preview)

MichaReiser commented Jan 15, 2025 • edited Loading

dhruvmanila commented Jan 15, 2025

MichaReiser commented Jan 15, 2025

dhruvmanila commented Jan 15, 2025

MichaReiser commented Jan 15, 2025

dhruvmanila commented Jan 15, 2025

MichaReiser commented Jan 15, 2025

dhruvmanila commented Jan 15, 2025

dhruvmanila commented Jan 15, 2025 •

edited

Loading

github-actions bot commented Jan 15, 2025 •

edited

Loading

`ruff-ecosystem` results

MichaReiser commented Jan 15, 2025 •

edited

Loading