Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid indexing the same workspace multiple times #15495

Merged
merged 2 commits into from
Jan 15, 2025

Conversation

dhruvmanila
Copy link
Member

@dhruvmanila dhruvmanila commented Jan 15, 2025

Summary

This is not lazy indexing but it should somewhat help with #13686.

Currently, processing the change notifications for config files doesn't account for the fact that multiple config files could belong to the same workspace. This means that the server will re-index the same workspace n times where n is the number of file events which belongs to the same workspace. This is evident in the following trace logs:

Trace logs:

[Trace - 6:21:15 PM] Sending notification 'workspace/didChangeWatchedFiles'.
Params: {
    "changes": [
        {
            "uri": "file:///Users/dhruv/work/astral/parser-checkouts/home-assistant-core/pylint/ruff.toml",
            "type": 1
        },
        {
            "uri": "file:///Users/dhruv/work/astral/parser-checkouts/home-assistant-core/script/ruff.toml",
            "type": 2
        },
        {
            "uri": "file:///Users/dhruv/work/astral/parser-checkouts/home-assistant-core/script/scaffold/templates/ruff.toml",
            "type": 2
        },
        {
            "uri": "file:///Users/dhruv/work/astral/parser-checkouts/home-assistant-core/pyproject.toml",
            "type": 2
        }
    ]
}

...

[Trace - 6:21:19 PM] Sending notification 'workspace/didChangeWatchedFiles'.
Params: {
    "changes": [
        {
            "uri": "file:///Users/dhruv/work/astral/parser-checkouts/home-assistant-core/tests/testing_config/custom_components/ruff.toml",
            "type": 1
        },
        {
            "uri": "file:///Users/dhruv/work/astral/parser-checkouts/home-assistant-core/tests/ruff.toml",
            "type": 2
        }
    ]
}

...

Server logs:

 14.838004208s TRACE     ruff:main notification{method="workspace/didChangeWatchedFiles"}: ruff_server::server::api: enter
  14.838043583s DEBUG     ruff:main notification{method="workspace/didChangeWatchedFiles"}: ruff_server::session::index::ruff_settings: Indexing settings for workspace: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core
  14.854324541s DEBUG ThreadId(55) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.vscode
  14.854388500s DEBUG ThreadId(55) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.git
  14.937713291s DEBUG     ruff:main notification{method="workspace/didChangeWatchedFiles"}: ruff_server::session::index::ruff_settings: Indexing settings for workspace: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core
  14.954429833s DEBUG ThreadId(75) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.git
  14.954675708s DEBUG ThreadId(66) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.vscode
  15.041465500s DEBUG     ruff:main notification{method="workspace/didChangeWatchedFiles"}: ruff_server::session::index::ruff_settings: Indexing settings for workspace: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core
  15.056731541s DEBUG ThreadId(78) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.vscode
  15.056796833s DEBUG ThreadId(78) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.git
  15.117545833s DEBUG     ruff:main notification{method="workspace/didChangeWatchedFiles"}: ruff_server::session::index::ruff_settings: Indexing settings for workspace: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core
  15.133091666s DEBUG ThreadId(90) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.vscode
  15.133146500s DEBUG ThreadId(90) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.git
  15.220340666s TRACE ruff:worker:6 request{id=5 method="textDocument/diagnostic"}: ruff_server::server::api: enter
  15.220401458s DEBUG ruff:worker:6 request{id=5 method="textDocument/diagnostic"}: ruff_server::resolve: Included path via `include`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/homeassistant/bootstrap.py
  18.577521250s TRACE     ruff:main notification{method="workspace/didChangeWatchedFiles"}: ruff_server::server::api: enter
  18.577561291s DEBUG     ruff:main notification{method="workspace/didChangeWatchedFiles"}: ruff_server::session::index::ruff_settings: Indexing settings for workspace: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core
  18.616564583s DEBUG ThreadId(102) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.vscode
  18.616627291s DEBUG ThreadId(102) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.git
  18.687424250s DEBUG     ruff:main notification{method="workspace/didChangeWatchedFiles"}: ruff_server::session::index::ruff_settings: Indexing settings for workspace: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core
  18.704441416s DEBUG ThreadId(114) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.vscode
  18.704694958s DEBUG ThreadId(121) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.git
  18.769627500s TRACE ruff:worker:4 request{id=6 method="textDocument/diagnostic"}: ruff_server::server::api: enter
  18.769696791s DEBUG ruff:worker:4 request{id=6 method="textDocument/diagnostic"}: ruff_server::resolve: Included path via `include`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/homeassistant/bootstrap.py

This PR updates the logic to consider all the change events at once keeping track of the workspace path that have been already indexed.

I want to include this in tomorrow's release to check how this change would affect the linked issue.

Test Plan

Run the same scenario as above and check the logs to see that the server isn't re-indexing the same workspace multiple times:

Trace logs:

[Trace - 6:04:07 PM] Sending notification 'workspace/didChangeWatchedFiles'.
Params: {
    "changes": [
        {
            "uri": "file:///Users/dhruv/work/astral/parser-checkouts/home-assistant-core/script/ruff.toml",
            "type": 1
        },
        {
            "uri": "file:///Users/dhruv/work/astral/parser-checkouts/home-assistant-core/script/scaffold/templates/ruff.toml",
            "type": 1
        },
        {
            "uri": "file:///Users/dhruv/work/astral/parser-checkouts/home-assistant-core/pylint/ruff.toml",
            "type": 2
        },
        {
            "uri": "file:///Users/dhruv/work/astral/parser-checkouts/home-assistant-core/pyproject.toml",
            "type": 2
        }
    ]
}

...

[Trace - 6:04:11 PM] Sending notification 'workspace/didChangeWatchedFiles'.
Params: {
    "changes": [
        {
            "uri": "file:///Users/dhruv/work/astral/parser-checkouts/home-assistant-core/tests/testing_config/custom_components/ruff.toml",
            "type": 1
        },
        {
            "uri": "file:///Users/dhruv/work/astral/parser-checkouts/home-assistant-core/tests/ruff.toml",
            "type": 2
        }
    ]
}

...

Server logs:

  17.047706750s TRACE     ruff:main notification{method="workspace/didChangeWatchedFiles"}: ruff_server::server::api: enter
  17.047747875s DEBUG     ruff:main notification{method="workspace/didChangeWatchedFiles"}: ruff_server::session::index::ruff_settings: Indexing settings for workspace: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core
  17.080006083s DEBUG ThreadId(54) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.vscode
  17.080085708s DEBUG ThreadId(54) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.git
  17.145328791s TRACE ruff:worker:6 request{id=5 method="textDocument/diagnostic"}: ruff_server::server::api: enter
  17.145386166s DEBUG ruff:worker:6 request{id=5 method="textDocument/diagnostic"}: ruff_server::resolve: Included path via `include`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/homeassistant/bootstrap.py
  20.756845958s TRACE     ruff:main notification{method="workspace/didChangeWatchedFiles"}: ruff_server::server::api: enter
  20.756923375s DEBUG     ruff:main notification{method="workspace/didChangeWatchedFiles"}: ruff_server::session::index::ruff_settings: Indexing settings for workspace: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core
  20.781733916s DEBUG ThreadId(66) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.vscode
  20.781825875s DEBUG ThreadId(75) ruff_server::session::index::ruff_settings: Ignored path via `exclude`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/.git
  20.848340750s TRACE ruff:worker:7 request{id=6 method="textDocument/diagnostic"}: ruff_server::server::api: enter
  20.848408041s DEBUG ruff:worker:7 request{id=6 method="textDocument/diagnostic"}: ruff_server::resolve: Included path via `include`: /Users/dhruv/work/astral/parser-checkouts/home-assistant-core/homeassistant/bootstrap.py

@dhruvmanila dhruvmanila added the server Related to the LSP server label Jan 15, 2025
///
/// This method avoids re-indexing the same workspace multiple times if multiple files
/// belonging to the same workspace have been changed.
pub(super) fn reload_settings(&mut self, changes: &[FileEvent]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me. We do something similar in Red Knot where we process all changes and compute what requires updating and only then do the updating.

@dhruvmanila dhruvmanila force-pushed the dhruv/avoid-indexing-multiple-times branch from 1d9bf50 to 1ca0631 Compare January 15, 2025 12:56
@dhruvmanila dhruvmanila marked this pull request as ready for review January 15, 2025 12:59
Copy link
Contributor

github-actions bot commented Jan 15, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

@MichaReiser
Copy link
Member

MichaReiser commented Jan 15, 2025

Unrelated to your changes but I noticed it in the tracing logs. I think we could improve the logs by making tracing aware that the walk_dir threads run as part of the request (untested)

Index: crates/ruff_server/src/session/index/ruff_settings.rs
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/crates/ruff_server/src/session/index/ruff_settings.rs b/crates/ruff_server/src/session/index/ruff_settings.rs
--- a/crates/ruff_server/src/session/index/ruff_settings.rs	(revision f20780584daf0b42a641639f3e8da4d67e005dff)
+++ b/crates/ruff_server/src/session/index/ruff_settings.rs	(date 1736946797069)
@@ -211,8 +211,11 @@
         let index = std::sync::RwLock::new(index);
         let has_error = AtomicBool::new(has_error);
 
+        let index_span = tracing::debug_span!("walk_dir");
+
         walker.run(|| {
             Box::new(|result| {
+                let index_span = index_span.clone();
+                let _span = index_span.entered();
                 let Ok(entry) = result else {
                     return WalkState::Continue;
                 };

@dhruvmanila
Copy link
Member Author

Unrelated to your changes but I noticed it in the tracing logs. I think we could improve the logs by making tracing aware that the walk_dir threads run as part of the request (untested)

Thank you, will look at it in a follow-up PR.

@MichaReiser
Copy link
Member

This is great. It would be a much simpler fix than making indexing actually lazy!

@dhruvmanila
Copy link
Member Author

This is great. It would be a much simpler fix than making indexing actually lazy!

It's not a complete fix because the client sends multiple didChangeWatchedFiles request as seen in the above trace logs. One of the ways to solve this would be to implement the file watching on the server side and not rely on client side implementation and we would make sure to only re-index the project once.

@MichaReiser I'm interested in hearing your opinion about whether that would be more feasible than lazy indexing. Maybe we could re-use the existing implementation from Red knot?

@dhruvmanila dhruvmanila merged commit b5dbb2a into main Jan 15, 2025
21 checks passed
@dhruvmanila dhruvmanila deleted the dhruv/avoid-indexing-multiple-times branch January 15, 2025 13:28
@MichaReiser
Copy link
Member

It's not a complete fix because the client sends multiple didChangeWatchedFiles request as seen in the above trace logs.

I don't see how this can be avoided. At least not if the modifications happen too far apart. Red Knot debounces the events to combine events that happen shortly after each other, but that timeout must be small (<300ms), or users experience a lag when making changes in the editor. The server could implement its own debounce logic to debounce the change events coming from the client, but it increases the risk that we respond to requests with an outdated configuration, and it introduces a delay that will become noticeable to users.

Lazy indexing might also not help if the changes happen too far apart because VS code might request updated diagnostics, forcing Ruff to search for the configuration.

Let's see if your change improves the experience. I think it's fairly likely that it will.

@dhruvmanila
Copy link
Member Author

The server could implement its own debounce logic to debounce the change events coming from the client, but it increases the risk that we respond to requests with an outdated configuration, and it introduces a delay that will become noticeable to users.

This is an interesting idea. I think the risk is less because at least for pull diagnostics it's only when the server will ask the client to refresh the diagnostics then the client will ask for diagnostics:

if !params.changes.is_empty() {
if session.resolved_client_capabilities().workspace_refresh {
requester
.request::<types::request::WorkspaceDiagnosticRefresh>((), |()| Task::nothing())
.with_failure_code(lsp_server::ErrorCode::InternalError)?;
} else {
// publish diagnostics for text documents
for url in session.text_document_urls() {
let snapshot = session
.take_snapshot(url.clone())
.expect("snapshot should be available");
publish_diagnostics_for_document(&snapshot, &notifier)?;
}
}
// always publish diagnostics for notebook files (since they don't use pull diagnostics)
for url in session.notebook_document_urls() {
let snapshot = session
.take_snapshot(url.clone())
.expect("snapshot should be available");
publish_diagnostics_for_document(&snapshot, &notifier)?;
}
}

We could experiment with a small amount of debounce period only after which the server will re-index the project and ask the client to refresh the diagnostics / publish the new diagnostics.

@MichaReiser
Copy link
Member

We could experiment with a small amount of debounce period only after which the server will re-index the project and ask the client to refresh the diagnostics / publish the new diagnostics.

I suspect that VS codes file watcher already does some debouncing but maybe not?

@dhruvmanila
Copy link
Member Author

I suspect that VS codes file watcher already does some debouncing but maybe not?

I'll need to look, maybe it does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
server Related to the LSP server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants