-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make Eio.Semaphore lock-free #398
Conversation
Before, two domains just passed control back and forth, which was really just timing how long it takes to wake a domain. Now, we keep the domains busy while waiting for the semaphore. We also now have 4 domains trying to use 2-4 resources, which is a more realistic use of a semaphore.
lib_eio/sem_state.ml
Outdated
(* This state is unreachable because we (the provider) haven't set this yet *) | ||
assert false | ||
in | ||
aux () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not yet fully understand the Cells
abstraction. However, looking at the code here it seems that Resumed
is a terminal state. If that is the case, then it may be safe to always exchange the state to Resumed
. Like I said, I do not yet fully understand Cells
so this might be wrong.
However, assuming that exchanging to Resumed
is a safe, then the above could be changed to eliminate the aux
loop:
let rec resume t =
let cell = Cells.next_resume t.cells in
match (Atomic.exchange cell Resumed : cell) with
| Request r ->
(* The common case: there was a waiter for the value *)
r ()
| (Cancelled
(* The waker had finished cancelling. Ignore it and resume the next one. *)
| Resumed
(* We lost the race. Ignore and resume the next one. *)) ->
resume t
| (Empty
(* The consumer had reserved this cell but not yet stored the request.
We placed Resumed there and the consumer will handle it soon. *)
| Cancelling
(* The waker had started cancelling. We let it know we want to resume it
and then let it handle it. *)) ->
()
(I edited the above to change the response to Resumed
where the race was lost.)
Assuming it is safe to always set to Resumed
, the above version avoids the loop overhead and only does a single atomic read-write. Assuming there are not many racing to do the same this may be slightly faster. This also likely results in a smaller amount of machine code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One side-effect is that calling cancel
twice currently reliably raises Invalid_argument
, whereas with this it might return false
. However, the only current user of this module (Semaphore) should never call cancel
twice anyway.
It doesn't seem noticeably faster (with dune exec -- ./bench/bench_semaphore.exe
any effect seems lost in the noise, on my machine anyway).
Assuming there are not many racing to do the same this may be slightly faster.
The only thing we can be racing with is a single consumer thread. The next_resume
means this is the only resume with this cell. Likewise, there is only one suspender with this cell. And if the suspender cancels, it is required to call cancel
only from the same domain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also likely results in a smaller amount of machine code.
We could combine the Empty
and Cancelled
handlers to save some code though (they're identical). But maybe it's clearer as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me just play a devil's advocate a bit here. :)
Perhaps one possibility might be to merge Resumed
and Cancelled
into a single, let's say, Terminal
state. There is a race to set the cell to Terminal
state. The action depends on the state from which the Terminal
state was reached.
One side-effect is that calling cancel twice currently reliably raises Invalid_argument
About cancel
. Hmm... Under which circumstances one might call cancel
twice? (Asking because I'm not sure I understand the concern.)
Assuming there is a race to call cancel
, I would tend to think that it is important that cancel
returns true
at most once. This way any follow up actions to successful cancellation would only be performed once. If this makes sense, then the API might be more user friendly without raising Invalid_argument. John Ousterhout talks about this in his talk on software design here.
ff5305f
to
58f2e09
Compare
Makes sense. I updated the PR (I called it Finished though, mainly because I started just before seeing your comment!).
It would always be a bug. I was just noting it as a change to this module's external behaviour (but it's only used by Semaphore anyway, and that will only call it once). Assuming the new version is correct, I think we should stop and admire that fact that our version has only 4 states, while the original paper had 9! (https://arxiv.org/pdf/2111.12682.pdf section 3.1 plus the REFUSE state added in 3.2) Thanks! |
lib_eio/sem_state.ml
Outdated
(* To call [cancel] the user needs a [request] value, | ||
which they only get once we've reached the [Request] state. | ||
[Empty] is unreachable from [Request]. *) | ||
assert false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... Speaking of reducing states, I wonder whether Empty
and Cancelling
states could actually be merged.
Is this the only place where they are handled differently? And both indicate a programming error?
Thinking of the approach here in more general terms, both Empty
and Cancelling
seem to be kind of InTransition
states where the suspender is planning to do something and needs to indicate that to resumers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - done :-)
We must wait until one of the existing users increments the counter and resumes us. | ||
It's OK if they resume before we suspend; we'll just pick up the token they left. *) | ||
Suspend.enter_unchecked (fun ctx enqueue -> | ||
match Sem_state.suspend t.state (fun () -> enqueue (Ok ())) with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an opportunity here to avoid the Suspend.enter_unchecked
call (which, I assume, captures the continuation) by exposing the Sem_state
protocol a bit more. Basically, require the acquirer to separately create the cell. After creating the cell, the acquirer would then read the cell state (as is done in Sem_state.suspend
at the moment in case the CAS fails) before capturing the continuation. If the state has already been set to Finished
, then there is no need to capture a continuation. Otherwise capture continuation and proceed as before. Basically, this results in a kind of double-checked pattern. This could improve performance in highly contested cases where it is possible that a resumer actually manages to see the InTransition
(or Empty
) state. The downside, of course, is making the Sem_state
API/protocol more complex (adding an extra step).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. This case wouldn't work:
- The suspender decrements the count to say they're planning to suspend.
- The resumer resumes the cell.
- The suspender creates the cell.
We have to be able to initialise cells before the suspender does anything beyond modifying the counter. We don't want it creating cells before changing the counter because that includes the fast path (where we don't need a cell at all).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear, this is what I had in mind:
diff --git a/lib_eio/sem_state.ml b/lib_eio/sem_state.ml
index 24a301c..7ffebf6 100644
--- a/lib_eio/sem_state.ml
+++ b/lib_eio/sem_state.ml
@@ -109,6 +109,26 @@ let acquire t =
which happens if we decremented *from* a positive one. *)
s > 0
+let prepare_suspend t =
+ Cells.next_suspend t.cells
+
+let is_in_transition (_, (cell: cell Atomic.t)) =
+ match Atomic.get cell with
+ | In_transition -> true
+ | Finished | Request _ -> false
+
+let perform_suspend t (segment, (cell: cell Atomic.t)) k : request option =
+ if Atomic.compare_and_set cell In_transition (Request k) then Some (t, segment, cell)
+ else (
+ (* We got resumed before we could add the waiter. *)
+ k ();
+ None
+ )
+
let suspend t k : request option =
let (segment, cell) = Cells.next_suspend t.cells in
if Atomic.compare_and_set cell In_transition (Request k) then Some (t, segment, cell)
diff --git a/lib_eio/semaphore.ml b/lib_eio/semaphore.ml
index 2733be5..a90fa64 100644
--- a/lib_eio/semaphore.ml
+++ b/lib_eio/semaphore.ml
@@ -20,21 +20,24 @@ let acquire t =
(* No free resources.
We must wait until one of the existing users increments the counter and resumes us.
It's OK if they resume before we suspend; we'll just pick up the token they left. *)
- Suspend.enter_unchecked (fun ctx enqueue ->
- match Sem_state.suspend t.state (fun () -> enqueue (Ok ())) with
- | None -> () (* Already resumed *)
- | Some request ->
- Ctf.note_try_read t.id;
- match Fiber_context.get_error ctx with
- | Some ex ->
- if Sem_state.cancel request then enqueue (Error ex);
- (* else already resumed *)
- | None ->
- Fiber_context.set_cancel_fn ctx (fun ex ->
- if Sem_state.cancel request then enqueue (Error ex)
- (* else already resumed *)
- )
- )
+ let segment_cell = Sem_state.prepare_suspend t.state in
+ (* We may have already been resumed at this point. So, check before capturing continuation. *)
+ if Sem_state.is_in_transition segment_cell then (
+ Suspend.enter_unchecked (fun ctx enqueue ->
+ match Sem_state.perform_suspend t.state segment_cell (fun () -> enqueue (Ok ())) with
+ | None -> () (* Already resumed *)
+ | Some request ->
+ Ctf.note_try_read t.id;
+ match Fiber_context.get_error ctx with
+ | Some ex ->
+ if Sem_state.cancel request then enqueue (Error ex);
+ (* else already resumed *)
+ | None ->
+ Fiber_context.set_cancel_fn ctx (fun ex ->
+ if Sem_state.cancel request then enqueue (Error ex)
+ (* else already resumed *)
+ )
+ )
+ )
);
Ctf.note_read t.id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Pushing some of the work to outside of the suspend in case we can skip the suspend by the time it's done, at the cost of an extra Atomic.get
.
I tried measuring it by incrementing slow
and fast
counters in is_in_transition
and running the benchmark gives:
slow=850475, fast=75, frac=0.01%
So probably not worth it, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I guess I should have disabled the single-process runs before measuring that... it's slow=51221, fast=98, frac=0.19%
now (0.19% of the slow path cases). But I don't think that changes anything.
58f2e09
to
5c9fac1
Compare
This uses the new Cells module to replace the use of a mutex. Co-authored-by: Vesa Karvonen <vesa.a.j.k@gmail.com>
5c9fac1
to
97acd4b
Compare
CHANGES: New features: - Add `Eio.Net.run_server` (@bikallem @talex5 ocaml-multicore/eio#408). Runs an accept loop in one or more domains, with cancellation and graceful shutdown, and an optional maximum number of concurrent connections. - Add `Buf_read.BE` and `LE` parsers (@Cjen1 ocaml-multicore/eio#399). Parse numbers in various binary formats. - Add `Eio.Buf_read.uint8` (@talex5 ocaml-multicore/eio#418). Performance: - Make `Eio.Condition` lock-free (@talex5 ocaml-multicore/eio#397 ocaml-multicore/eio#381). In addition to being faster, this allows using conditions in signal handlers. - Make `Eio.Semaphore` lock-free (@talex5 @polytypic ocaml-multicore/eio#398). - Make `Eio.Stream` lock-free when the capacity is zero (@talex5 ocaml-multicore/eio#413 ocaml-multicore/eio#411). - Make `Eio.Promise` lock-free (@talex5 ocaml-multicore/eio#401). Bug fixes: - eio_linux: call `Uring.submit` as needed (@talex5 @bikallem ocaml-multicore/eio#428). Previously, we could fail to submit a job promptly because the SQE queue was full. - Fix luv signals (@haesbaert ocaml-multicore/eio#412). `libuv` automatically retries polling if it gets `EINTR`, without giving OCaml signal handlers a chance to run. - eio_luv: fix some resource leaks (@talex5 @patricoferris ocaml-multicore/eio#421). - eio_luv: fix "unavailable signal" error on Windows (@talex5 ocaml-multicore/eio#420, reported by @nojb). - Fix `Buf_write.BE.uint48` and `LE.uint48` (@adatario ocaml-multicore/eio#418). Documentation: - Add example programs (@talex5 ocaml-multicore/eio#389). - Update network examples to use `run_server` (@talex5 ocaml-multicore/eio#417). - Add a warning to the tutorial about `Fiber.first` (@talex5 ocaml-multicore/eio#394). - Clarify the epoch used for `Eio.Time.now` (@bikallem ocaml-multicore/eio#395). - Describe `secure_random` as an infinite source (@patricoferris ocaml-multicore/eio#426). - Update README for OCaml 5 release (@talex5 ocaml-multicore/eio#384 ocaml-multicore/eio#391 ocaml-multicore/eio#393). Other changes: - Delay setting `SIGPIPE` handler until the `run` function is called (@talex5 ocaml-multicore/eio#420). - Remove debug-level logging (@talex5 ocaml-multicore/eio#403). - eio-luv: improve `process.md` test (@smondet ocaml-multicore/eio#414). - Update to Dune 3 (@talex5 ocaml-multicore/eio#410). - Remove test dependency on Astring (@talex5 ocaml-multicore/eio#402 ocaml-multicore/eio#404). - Simplify cancellation logic (@talex5 ocaml-multicore/eio#396). - time: `Mtime.Spand.to_s` has been deprecated in mtime 2.0.0 (@bikallem ocaml-multicore/eio#385).
CHANGES: New features: - Add `Eio.Net.run_server` (@bikallem @talex5 ocaml-multicore/eio#408). Runs an accept loop in one or more domains, with cancellation and graceful shutdown, and an optional maximum number of concurrent connections. - Add `Buf_read.BE` and `LE` parsers (@Cjen1 ocaml-multicore/eio#399). Parse numbers in various binary formats. - Add `Eio.Buf_read.uint8` (@talex5 ocaml-multicore/eio#418). Performance: - Make `Eio.Condition` lock-free (@talex5 ocaml-multicore/eio#397 ocaml-multicore/eio#381). In addition to being faster, this allows using conditions in signal handlers. - Make `Eio.Semaphore` lock-free (@talex5 @polytypic ocaml-multicore/eio#398). - Make `Eio.Stream` lock-free when the capacity is zero (@talex5 ocaml-multicore/eio#413 ocaml-multicore/eio#411). - Make `Eio.Promise` lock-free (@talex5 ocaml-multicore/eio#401). Bug fixes: - eio_linux: call `Uring.submit` as needed (@talex5 @bikallem ocaml-multicore/eio#428). Previously, we could fail to submit a job promptly because the SQE queue was full. - Fix luv signals (@haesbaert ocaml-multicore/eio#412). `libuv` automatically retries polling if it gets `EINTR`, without giving OCaml signal handlers a chance to run. - eio_luv: fix some resource leaks (@talex5 @patricoferris ocaml-multicore/eio#421). - eio_luv: fix "unavailable signal" error on Windows (@talex5 ocaml-multicore/eio#420, reported by @nojb). - Fix `Buf_write.BE.uint48` and `LE.uint48` (@adatario ocaml-multicore/eio#418). Documentation: - Add example programs (@talex5 ocaml-multicore/eio#389). - Update network examples to use `run_server` (@talex5 ocaml-multicore/eio#417). - Add a warning to the tutorial about `Fiber.first` (@talex5 ocaml-multicore/eio#394). - Clarify the epoch used for `Eio.Time.now` (@bikallem ocaml-multicore/eio#395). - Describe `secure_random` as an infinite source (@patricoferris ocaml-multicore/eio#426). - Update README for OCaml 5 release (@talex5 ocaml-multicore/eio#384 ocaml-multicore/eio#391 ocaml-multicore/eio#393). Other changes: - Delay setting `SIGPIPE` handler until the `run` function is called (@talex5 ocaml-multicore/eio#420). - Remove debug-level logging (@talex5 ocaml-multicore/eio#403). - eio-luv: improve `process.md` test (@smondet ocaml-multicore/eio#414). - Update to Dune 3 (@talex5 ocaml-multicore/eio#410). - Remove test dependency on Astring (@talex5 ocaml-multicore/eio#402 ocaml-multicore/eio#404). - Simplify cancellation logic (@talex5 ocaml-multicore/eio#396). - time: `Mtime.Spand.to_s` has been deprecated in mtime 2.0.0 (@bikallem ocaml-multicore/eio#385).
CHANGES: New features: - Add `Eio.Net.run_server` (@bikallem @talex5 ocaml-multicore/eio#408). Runs an accept loop in one or more domains, with cancellation and graceful shutdown, and an optional maximum number of concurrent connections. - Add `Buf_read.BE` and `LE` parsers (@Cjen1 ocaml-multicore/eio#399). Parse numbers in various binary formats. - Add `Eio.Buf_read.uint8` (@talex5 ocaml-multicore/eio#418). Performance: - Make `Eio.Condition` lock-free (@talex5 ocaml-multicore/eio#397 ocaml-multicore/eio#381). In addition to being faster, this allows using conditions in signal handlers. - Make `Eio.Semaphore` lock-free (@talex5 @polytypic ocaml-multicore/eio#398). - Make `Eio.Stream` lock-free when the capacity is zero (@talex5 ocaml-multicore/eio#413 ocaml-multicore/eio#411). - Make `Eio.Promise` lock-free (@talex5 ocaml-multicore/eio#401). Bug fixes: - eio_linux: call `Uring.submit` as needed (@talex5 @bikallem ocaml-multicore/eio#428). Previously, we could fail to submit a job promptly because the SQE queue was full. - Fix luv signals (@haesbaert ocaml-multicore/eio#412). `libuv` automatically retries polling if it gets `EINTR`, without giving OCaml signal handlers a chance to run. - eio_luv: fix some resource leaks (@talex5 @patricoferris ocaml-multicore/eio#421). - eio_luv: fix "unavailable signal" error on Windows (@talex5 ocaml-multicore/eio#420, reported by @nojb). - Fix `Buf_write.BE.uint48` and `LE.uint48` (@adatario ocaml-multicore/eio#418). Documentation: - Add example programs (@talex5 ocaml-multicore/eio#389). - Update network examples to use `run_server` (@talex5 ocaml-multicore/eio#417). - Add a warning to the tutorial about `Fiber.first` (@talex5 ocaml-multicore/eio#394). - Clarify the epoch used for `Eio.Time.now` (@bikallem ocaml-multicore/eio#395). - Describe `secure_random` as an infinite source (@patricoferris ocaml-multicore/eio#426). - Update README for OCaml 5 release (@talex5 ocaml-multicore/eio#384 ocaml-multicore/eio#391 ocaml-multicore/eio#393). Other changes: - Delay setting `SIGPIPE` handler until the `run` function is called (@talex5 ocaml-multicore/eio#420). - Remove debug-level logging (@talex5 ocaml-multicore/eio#403). - eio-luv: improve `process.md` test (@smondet ocaml-multicore/eio#414). - Update to Dune 3 (@talex5 ocaml-multicore/eio#410). - Remove test dependency on Astring (@talex5 ocaml-multicore/eio#402 ocaml-multicore/eio#404). - Simplify cancellation logic (@talex5 ocaml-multicore/eio#396). - time: `Mtime.Spand.to_s` has been deprecated in mtime 2.0.0 (@bikallem ocaml-multicore/eio#385).
CHANGES: New features: - Add `Eio.Net.run_server` (@bikallem @talex5 ocaml-multicore/eio#408). Runs an accept loop in one or more domains, with cancellation and graceful shutdown, and an optional maximum number of concurrent connections. - Add `Buf_read.BE` and `LE` parsers (@Cjen1 ocaml-multicore/eio#399). Parse numbers in various binary formats. - Add `Eio.Buf_read.uint8` (@talex5 ocaml-multicore/eio#418). Performance: - Make `Eio.Condition` lock-free (@talex5 ocaml-multicore/eio#397 ocaml-multicore/eio#381). In addition to being faster, this allows using conditions in signal handlers. - Make `Eio.Semaphore` lock-free (@talex5 @polytypic ocaml-multicore/eio#398). - Make `Eio.Stream` lock-free when the capacity is zero (@talex5 ocaml-multicore/eio#413 ocaml-multicore/eio#411). - Make `Eio.Promise` lock-free (@talex5 ocaml-multicore/eio#401). Bug fixes: - eio_linux: call `Uring.submit` as needed (@talex5 @bikallem ocaml-multicore/eio#428). Previously, we could fail to submit a job promptly because the SQE queue was full. - Fix luv signals (@haesbaert ocaml-multicore/eio#412). `libuv` automatically retries polling if it gets `EINTR`, without giving OCaml signal handlers a chance to run. - eio_luv: fix some resource leaks (@talex5 @patricoferris ocaml-multicore/eio#421). - eio_luv: fix "unavailable signal" error on Windows (@talex5 ocaml-multicore/eio#420, reported by @nojb). - Fix `Buf_write.BE.uint48` and `LE.uint48` (@adatario ocaml-multicore/eio#418). Documentation: - Add example programs (@talex5 ocaml-multicore/eio#389). - Update network examples to use `run_server` (@talex5 ocaml-multicore/eio#417). - Add a warning to the tutorial about `Fiber.first` (@talex5 ocaml-multicore/eio#394). - Clarify the epoch used for `Eio.Time.now` (@bikallem ocaml-multicore/eio#395). - Describe `secure_random` as an infinite source (@patricoferris ocaml-multicore/eio#426). - Update README for OCaml 5 release (@talex5 ocaml-multicore/eio#384 ocaml-multicore/eio#391 ocaml-multicore/eio#393). Other changes: - Delay setting `SIGPIPE` handler until the `run` function is called (@talex5 ocaml-multicore/eio#420). - Remove debug-level logging (@talex5 ocaml-multicore/eio#403). - eio-luv: improve `process.md` test (@smondet ocaml-multicore/eio#414). - Update to Dune 3 (@talex5 ocaml-multicore/eio#410). - Remove test dependency on Astring (@talex5 ocaml-multicore/eio#402 ocaml-multicore/eio#404). - Simplify cancellation logic (@talex5 ocaml-multicore/eio#396). - time: `Mtime.Spand.to_s` has been deprecated in mtime 2.0.0 (@bikallem ocaml-multicore/eio#385).
This also changes the semaphore benchmark. Before, two domains just passed control back and forth, which was really just timing how long it takes to wake a domain. Now, we keep the domains busy while waiting for the semaphore. We also now have 4 domains trying to use 2-4 resources, which is a more realistic use of a semaphore.
This is a little faster in the single domain case, and quite a bit faster with multiple domains: