Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid race condition #137

Merged
merged 1 commit into from
Jul 24, 2024

Conversation

edsko
Copy link
Collaborator

@edsko edsko commented Jul 19, 2024

See #136 for discussion.

I think the test failure we were seeing there is caused by the fact that throwTo depends on the exception being raised to the target thread (even if it perhaps is then handling it), and that was not always succeeding (specifically, we failed to kill the "H2 streaming supporter" thread spawned in Network.HTTP2.Client.Run.sendStreaming). I'm not 100% sure why this happens, perhaps @kazu-yamamoto knows why throwing an exception to that thread would not succeed.

Fortunately, however, we didn't quite need the strong guarantees that were provided by kill in #136: we don't need a guarantee that the thread has already been killed, we merely need a guarantee that the thread has been removed from the timeout manager, so that we avoid the race condition explained in #136.

This PR therefore is identical to #136 except in one place: we signal that we are done (now called signalTimeoutsDisabled) after we have removed all threads from the timeout manager, but before we necessarily have killed them.

I cannot reproduce the test failure after this. Test should always send the connection preface first was failing quite reliably for me (though only with +RTS -N) before this fix. After the fix I ran it 1000 times and no failures, and I ran the full test suite 100 times, also no fails.

This is a subtle one! There was a race between two thread managers: the
`System.TimeManager.Manager` and the `Network.HTTP2.H2.Manager` (which wraps
the former). The race condition could arise as follows:

1. Suppose an exception is raised in one of the background threads (e.g.,
   the frame sender).

2. Then `stopAfter` writes a `Stop` instruction the `H2.Manager` queue;
   the exception gets then rethrown.

3. If the server is run in the scope of `withManager` (as it is done in
   `Network.Run.TCP.Timeout.runTCPServerWithSocket` for example), the
   exception is caught again, causing `stopManager` to be called.

4. Now one of two things can happen /first/ (for each managed thread):

   a. `stopManager` might call the timeout action, which will throw
      `TimeoutThread` to the thread
      (because we registered the threads with `registerKillThread`)

   b. The `H2.Manager` will process the `Stop` instruction, and throw
      `KilledByHTTP2ThreadManager` to the thread.

   The choice is non-deterministic, depending on thread scheduling by the RTS.

5. After this, the _other_ exception is then thrown to _same_ thread
   (which might at this point be busy handling the _first_ exception).

So not only is the type of exception that is thrown non-deterministic, we also
have the possibility that the thread will be interrupted handling that first
exception by the second exception.

This commit fixes this by putting the `H2.Manager` in charge (it is aware of
both managers, after all). When `stopAfter` writes `Stop` to the `H2.Manager`
queue, it waits until that `Stop` instruction has been processed; this will
(after this commit) also ensure that the threads have been removed from the
timeout manager. This guarantees that the exception that is delivered (and the
_only_ exception that is delivered) is the `KilledByHTTP2ThreadManager` from
`http2`.

Co-authored-by: Edsko de Vries <edsko@well-typed.com>
@edsko edsko mentioned this pull request Jul 19, 2024
edsko added a commit to well-typed/grapesy that referenced this pull request Jul 19, 2024
@kazu-yamamoto kazu-yamamoto self-requested a review July 24, 2024 06:57
Copy link
Owner

@kazu-yamamoto kazu-yamamoto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kazu-yamamoto kazu-yamamoto merged commit ab43992 into kazu-yamamoto:main Jul 24, 2024
10 checks passed
@kazu-yamamoto
Copy link
Owner

Merged.
Thanks.

@edsko edsko deleted the finley/avoid-manager-race branch July 24, 2024 07:16
edsko added a commit to well-typed/grapesy that referenced this pull request Jul 24, 2024
edsko added a commit to well-typed/grapesy that referenced this pull request Jul 24, 2024
edsko added a commit to well-typed/grapesy that referenced this pull request Jul 25, 2024
edsko added a commit to well-typed/grapesy that referenced this pull request Jul 25, 2024
@kazu-yamamoto kazu-yamamoto mentioned this pull request Nov 26, 2024
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this pull request Jan 30, 2025
# ChangeLog for http2

## 5.3.9

* Using `ThreadManager` of `time-manager`.

## 5.3.8

* `forkManagedTimeout` ensures that only one asynchronous exception is
  thrown. Fixing the thread leak via `Weak ThreadId` and `modifyTVar'`.
  [#156](kazu-yamamoto/http2#156)

## 5.3.7

* Using `withHandle` of time-manager.
* Getting `Handle` for each thread.
* Providing allocSimpleConfig' to enable customizing WAI tiemout manager.
* Monitor option (-m) for h2c-client and h2c-server.

## 5.3.6

* Making `runIO` friendly with the new synchronism mechanism.
  [#152](kazu-yamamoto/http2#152)
* Re-throwing asynchronous exceptions to prevent thread leak.
* Simplifying the synchronism mechanism between workers and the sender.
  [#148](kazu-yamamoto/http2#148)

## 5.3.5

* Using `http-semantics` v0.3.
* Deprecating `numberOfWorkers`.
* Removing `unliftio`.
* Avoid `undefined` in client.
  [#146](kazu-yamamoto/http2#146)

## 5.3.4

* Support stream cancellation
  [#142](kazu-yamamoto/http2#142)

## 5.3.3

* Enclosing IPv6 literal authority with square brackets.
  [#143](kazu-yamamoto/http2#143)

## 5.3.2

* Avoid unnecessary empty data frames at end of stream
  [#140](kazu-yamamoto/http2#140)
* Removing unnecessary API from ServerIO

## 5.3.1

* Fix treatment of async exceptions
  [#138](kazu-yamamoto/http2#138)
* Avoid race condition
  [#137](kazu-yamamoto/http2#137)

## 5.3.0

* New server architecture: spawning worker on demand instead of the
  worker pool. This reduce huge numbers of threads for streaming into
  only 2. No API changes but workers do not terminate quicly. Rather
  workers collaborate with the sender after queuing a response and
  finish after all response data are sent.
* All threads are labeled with `labelThread`. You can see them by
  `listThreas` if necessary.

## 5.2.6

* Recover rxflow on closing.
  [#126](kazu-yamamoto/http2#126)
* Fixing ClientSpec for stream errors.
* Allowing negative window. (h2spec http2/6.9.2)
* Update for latest http-semantics
  [#122](kazu-yamamoto/http2#124)

## 5.2.5

* Setting peer initial window size properly.
  [#123](kazu-yamamoto/http2#123)

## 5.2.4

* Update for latest http-semantics
  [#122](kazu-yamamoto/http2#122)
* Measuring performance concurrently for h2c-client

## 5.2.3

* Update for latest http-semantics
  [#120](kazu-yamamoto/http2#120)
* Enable containers 0.7 (ghc 9.10)
  [#117](kazu-yamamoto/http2#117)

## 5.2.2

* Mark final chunk as final
  [#116](kazu-yamamoto/http2#116)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants