Skip to content

Attempt to optimize LocalExecutor #144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

james7132
Copy link
Contributor

@james7132 james7132 commented Jul 24, 2025

This PR makes LocalExecutor it's own separate implementation instead of wrapping an Executor, forking it's own State type utilizing unsynchronized !Send types where possible: Arc -> Rc, Mutex -> RefCell, AtomicPtr -> Cell<*mut T>, ConcurrentQueue -> VecDeque. This implementation also removes any extra operations that assumes there are other concurrent Runners/Tickers (i.e. local queues, extra notifications).

For testing, I've duplicated most of the single-threaded compatible executor tests to ensure there's equivalent coverage on the new independent LocalExecutor.

I've also made an attempt to write this with UnsafeCell instead of RefCell, but this litters a huge amount of unsafe that might be too much for this crate. The gains here might not be worth it.

I previously wrote some additional benchmarks but lost the changes to a stray git reset --hard when benchmarking.
The gains here are substantial. Here are the results:

single_thread/local_executor::spawn_one
                        time:   [130.05 ns 130.98 ns 132.16 ns]
                        change: [-80.586% -80.413% -80.214%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  4 (4.00%) high mild
  10 (10.00%) high severe
single_thread/local_executor::spawn_batch
                        time:   [17.195 µs 22.167 µs 32.281 µs]
                        change: [-30.007% +0.4082% +41.137%] (p = 0.98 > 0.05)
                        No change in performance detected.
Found 17 outliers among 100 measurements (17.00%)
  7 (7.00%) high mild
  10 (10.00%) high severe
Benchmarking single_thread/local_executor::spawn_many_local: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 856.5s, or reduce sample count to 10.
single_thread/local_executor::spawn_many_local
                        time:   [2.7766 ms 2.8091 ms 2.8453 ms]
                        change: [-21.653% -8.5454% +6.4249%] (p = 0.30 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe
single_thread/local_executor::spawn_recursively
                        time:   [19.618 ms 20.071 ms 20.558 ms]
                        change: [-24.237% -22.461% -20.762%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild
Benchmarking single_thread/local_executor::yield_now: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.7s, enable flat sampling, or reduce sample count to 50.
single_thread/local_executor::yield_now
                        time:   [1.8382 ms 1.8421 ms 1.8470 ms]
                        change: [-52.888% -52.744% -52.570%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) high mild
  9 (9.00%) high severe
single_thread/local_executor::channels
                        time:   [9.9259 ms 9.9355 ms 9.9452 ms]
                        change: [-15.797% -15.656% -15.533%] (p = 0.00 < 0.05)
                        Performance has improved.
single_thread/local_executor::web_server
                        time:   [54.839 µs 56.776 µs 59.062 µs]
                        change: [-23.926% -18.725% -13.009%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

TODO: Recreate these benchmarks.

@james7132
Copy link
Contributor Author

Hmmm, VecDeque::new is not const until Rust 1.68. It could be trivially replaced with LinkedList which has a const constructor in 1.63, but it would be a major regression in performance.

@taiki-e
Copy link
Collaborator

taiki-e commented Jul 25, 2025

There is no need to be worried about the MSRV of const VecDeque::new, because we will be able to raise the MSRV after about two weeks: smol-rs/smol#244 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants