-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid race condition #137
Merged
kazu-yamamoto
merged 1 commit into
kazu-yamamoto:main
from
edsko:finley/avoid-manager-race
Jul 24, 2024
Merged
Avoid race condition #137
kazu-yamamoto
merged 1 commit into
kazu-yamamoto:main
from
edsko:finley/avoid-manager-race
Jul 24, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is a subtle one! There was a race between two thread managers: the `System.TimeManager.Manager` and the `Network.HTTP2.H2.Manager` (which wraps the former). The race condition could arise as follows: 1. Suppose an exception is raised in one of the background threads (e.g., the frame sender). 2. Then `stopAfter` writes a `Stop` instruction the `H2.Manager` queue; the exception gets then rethrown. 3. If the server is run in the scope of `withManager` (as it is done in `Network.Run.TCP.Timeout.runTCPServerWithSocket` for example), the exception is caught again, causing `stopManager` to be called. 4. Now one of two things can happen /first/ (for each managed thread): a. `stopManager` might call the timeout action, which will throw `TimeoutThread` to the thread (because we registered the threads with `registerKillThread`) b. The `H2.Manager` will process the `Stop` instruction, and throw `KilledByHTTP2ThreadManager` to the thread. The choice is non-deterministic, depending on thread scheduling by the RTS. 5. After this, the _other_ exception is then thrown to _same_ thread (which might at this point be busy handling the _first_ exception). So not only is the type of exception that is thrown non-deterministic, we also have the possibility that the thread will be interrupted handling that first exception by the second exception. This commit fixes this by putting the `H2.Manager` in charge (it is aware of both managers, after all). When `stopAfter` writes `Stop` to the `H2.Manager` queue, it waits until that `Stop` instruction has been processed; this will (after this commit) also ensure that the threads have been removed from the timeout manager. This guarantees that the exception that is delivered (and the _only_ exception that is delivered) is the `KilledByHTTP2ThreadManager` from `http2`. Co-authored-by: Edsko de Vries <edsko@well-typed.com>
Closed
edsko
added a commit
to well-typed/grapesy
that referenced
this pull request
Jul 19, 2024
This depends on kazu-yamamoto/http2#137. Closes #190.
kazu-yamamoto
approved these changes
Jul 24, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Merged. |
edsko
added a commit
to well-typed/grapesy
that referenced
this pull request
Jul 24, 2024
This depends on kazu-yamamoto/http2#137. Closes #190.
edsko
added a commit
to well-typed/grapesy
that referenced
this pull request
Jul 24, 2024
This depends on kazu-yamamoto/http2#137. Closes #190.
edsko
added a commit
to well-typed/grapesy
that referenced
this pull request
Jul 25, 2024
This depends on kazu-yamamoto/http2#137. Closes #190.
edsko
added a commit
to well-typed/grapesy
that referenced
this pull request
Jul 25, 2024
This depends on kazu-yamamoto/http2#137. Closes #190.
Merged
netbsd-srcmastr
pushed a commit
to NetBSD/pkgsrc
that referenced
this pull request
Jan 30, 2025
# ChangeLog for http2 ## 5.3.9 * Using `ThreadManager` of `time-manager`. ## 5.3.8 * `forkManagedTimeout` ensures that only one asynchronous exception is thrown. Fixing the thread leak via `Weak ThreadId` and `modifyTVar'`. [#156](kazu-yamamoto/http2#156) ## 5.3.7 * Using `withHandle` of time-manager. * Getting `Handle` for each thread. * Providing allocSimpleConfig' to enable customizing WAI tiemout manager. * Monitor option (-m) for h2c-client and h2c-server. ## 5.3.6 * Making `runIO` friendly with the new synchronism mechanism. [#152](kazu-yamamoto/http2#152) * Re-throwing asynchronous exceptions to prevent thread leak. * Simplifying the synchronism mechanism between workers and the sender. [#148](kazu-yamamoto/http2#148) ## 5.3.5 * Using `http-semantics` v0.3. * Deprecating `numberOfWorkers`. * Removing `unliftio`. * Avoid `undefined` in client. [#146](kazu-yamamoto/http2#146) ## 5.3.4 * Support stream cancellation [#142](kazu-yamamoto/http2#142) ## 5.3.3 * Enclosing IPv6 literal authority with square brackets. [#143](kazu-yamamoto/http2#143) ## 5.3.2 * Avoid unnecessary empty data frames at end of stream [#140](kazu-yamamoto/http2#140) * Removing unnecessary API from ServerIO ## 5.3.1 * Fix treatment of async exceptions [#138](kazu-yamamoto/http2#138) * Avoid race condition [#137](kazu-yamamoto/http2#137) ## 5.3.0 * New server architecture: spawning worker on demand instead of the worker pool. This reduce huge numbers of threads for streaming into only 2. No API changes but workers do not terminate quicly. Rather workers collaborate with the sender after queuing a response and finish after all response data are sent. * All threads are labeled with `labelThread`. You can see them by `listThreas` if necessary. ## 5.2.6 * Recover rxflow on closing. [#126](kazu-yamamoto/http2#126) * Fixing ClientSpec for stream errors. * Allowing negative window. (h2spec http2/6.9.2) * Update for latest http-semantics [#122](kazu-yamamoto/http2#124) ## 5.2.5 * Setting peer initial window size properly. [#123](kazu-yamamoto/http2#123) ## 5.2.4 * Update for latest http-semantics [#122](kazu-yamamoto/http2#122) * Measuring performance concurrently for h2c-client ## 5.2.3 * Update for latest http-semantics [#120](kazu-yamamoto/http2#120) * Enable containers 0.7 (ghc 9.10) [#117](kazu-yamamoto/http2#117) ## 5.2.2 * Mark final chunk as final [#116](kazu-yamamoto/http2#116)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See #136 for discussion.
I think the test failure we were seeing there is caused by the fact that
throwTo
depends on the exception being raised to the target thread (even if it perhaps is then handling it), and that was not always succeeding (specifically, we failed to kill the "H2 streaming supporter" thread spawned inNetwork.HTTP2.Client.Run.sendStreaming
). I'm not 100% sure why this happens, perhaps @kazu-yamamoto knows why throwing an exception to that thread would not succeed.Fortunately, however, we didn't quite need the strong guarantees that were provided by
kill
in #136: we don't need a guarantee that the thread has already been killed, we merely need a guarantee that the thread has been removed from the timeout manager, so that we avoid the race condition explained in #136.This PR therefore is identical to #136 except in one place: we signal that we are done (now called
signalTimeoutsDisabled
) after we have removed all threads from the timeout manager, but before we necessarily have killed them.I cannot reproduce the test failure after this. Test
should always send the connection preface first
was failing quite reliably for me (though only with+RTS -N
) before this fix. After the fix I ran it 1000 times and no failures, and I ran the full test suite 100 times, also no fails.