-
Notifications
You must be signed in to change notification settings - Fork 748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tcp_stream test fails with TcpStream::set_ttl crash #1117
Comments
Should we document that set_ttl (and maybe more methods) require a writable (or readable?) event first? |
Not sure if documenting is enough, someone might miss reading the docs, also there might be APIs sensitive to connect status on other platforms too. In my case, the crash was not always reproducing, if I enabled/disabled other tests from the same test file. Maybe implement support in mio for checking if a stream is connected? Or even better, have the APIs sensitive to connect status, to check for connect and return an error in case socket is not connected yet. |
@Thomasdezeeuw please set windows flag on this issue |
I don't entirely agree with documenting is not enough, although an API that can't be misused would be better, documentation is sometimes simply the best we can do without introducing overhead.
How do propose we do this, without introducing (too much) overhead? If I remember correctly somewhere, at some point, Mio documentation said that |
@Thomasdezeeuw thinking out loud here, on Linux you are allowed to call setsockopt w/ IP_TTL before connect, while on Windows set_ttl requires the socket to be connected. An application using mio interface set_ttl would have to have a different implementation for Linux and Windows, maybe we don't want that, not sure if this is a design goal of mio. Otherwise, just documenting it seem reasonable ATM. A possible solution would be using completion ports the same way they are used for simulate polling. A connected stream event is just another poll for WRITABLE for that stream. Here's some pseudocode below.
I'm not sure about the overhead introduced by this solution, how it would affect performance, could be minimal sincer we'd just reuse the code already present for simulating poll by using AFD driver. |
@dtacalau Mio is currently missing benchmarks, in the v0.6 -> v0.7 transition a lot of the API changed, but its something to add again after initial v0.7 release. As for keep track of whether or not the stream is connected; that would require (at least) a boolean value. But if we document that the What do you think? |
@Thomasdezeeuw not sure if it's possible to pretend stream is only ready after an initial read/writable event has been received on all mio supported platforms. What if mio will need to offer interface for setting a socket option which requires a not connected stream, the exact opposite of set_ttl. That's why I'm thinking it is safer to handle the issue per specific option, depending on the option needs. Or maybe introduce a new interest called CONFIGURABLE, ability to configure behavior of a stream (call setsockopt on a stream sock), which will be platform specific behind the scenes. So, on Windows, taking set_ttl as example:
Anyway, if we can pretend ATM that all currently supported options require a connected stream, I'm fine with just documenting, just keep in mind the opposite case :). |
@Thomasdezeeuw I could also implement some benchmarks and have the CI running them along with the rest of the tests (or ressurect old benchmarks). Let me know what you think! |
…s, this test was crashing on Windows because, on Windows calling setsockopt on a not connected socket is undefined behavior. More details can be found in issue tokio-rs#1117. Signed-off-by: Daniel Tacalau <dst4096@gmail.com>
@Thomasdezeeuw to wrap up our discussion: I've commited a fix for the set_ttl test, but not really satisfied with the fix (since this is only a Windows limitation, other platforms would not need the extra register and wait). Would like to know your thoughts on my ideas above: benchmarks, adding a new CONFIGURABLE interest. Thanks! |
This this is more or less the best we can do without introducing overhead elsewhere, if you're talking about registering before calling
I not really in favour of a configurable interest because I wouldn't know that would be? I think it would just be another name for writable on Windows? But then you have an awkward situation in which you'll get an writable event even though you only registered configurable interests. As for the benchmarks; there very welcome. I think the API is starting to stabilise so we I think the work to maintain them won't be too much. |
We should make the test work in windows or make the API a *nix only API IMO. I think we can make this work in windows by storing the |
After 60bf56b this works on Windows, but it does require the user to receive an event for it first. Do mean that calling |
@carllerche atm the ttl test works on windows, also nodelay tests work (they were similar issues). |
Mark from the Windows core networking team offered to help investigating this issue. Will keep this issue updated with the progress done and other interesting facts. First update follows. Mark was unable to reproduce it: "I've been working on reproing the TTL issue you mentioned, and I'm having some trouble doing so. I tried setting + getting the option on client and server TCP AF_INET sockets before any connection attempt is made, after client calls connect but before server accepts and after server accepts, and they pass the smoke test of being able to set, and then retrieve the same value. " " I tried this on my development machine, and on a Windows 8.1 pro 64-bit VM I spun up and both pass the tests. " He asked me to collect a time travel debugging trace of the repro. The issue does not reproduce when run under time travel debugging tool tttracer tool. I think tracing make things run slower and changes the timing. However I've prepared a branch where the issue reproduces almost always, I've test it under Windows 8.1 and also Windows 10:
|
Carl and I are continuing to investigate this issue. Some other findings include that running the test in isolation, and running the test with --test-threads 1 makes the repro go away. However, putting a 2 second sleep in the server thread with multi-threading enabled makes the repro 100%. Carl identified what looks like uninitialized reads using Dr Memory. So, overall the investigation is ongoing. |
Hey @MarkLivsc, it's Daniel not Carl :). In addition to what Mark said, more facts:
|
@Thomasdezeeuw please re-assign this issue to me, thanks. |
@dtacalau is there anything left to do here? |
@Thomasdezeeuw yes, this is wip. Me and Mark have continued investigating this. Biggest problem here is that the issue repro on my machine only. Mark was unable to repro this on his machines. Mark has prepared a set of custom drivers with more debug info which I installed on a machine where the issue repro. But unfortunately issue did not reproduce under custom drivers. Anything that slows down the machine renders the issue to not reproduce anymore. Next, Mark has prepared a minimalist repro test that just keep running until it hits the issue, so he's trying to isolate the issue from mio and rust and repro-ing it in another context. The most important part here is to find a way to reproduce it on other machines too. |
@dtacalau do you think this should block the v0.7 release? Or can we move it to the v1.0? |
@Thomasdezeeuw not a blocker for v0.7 release imo. |
We have a repro, this rust test always fails:
@MarkLivsc also implemented a test that runs N instances of essentially the test above (but written in C) in parallel. The issue repro with Mark's test but only when run multithreaded, when run on one thread it does not repro. The key here was the failing async connect, so now we can define the problem better: Since Mark now has a repro, he'll continue the investigation on his end and promissed to keep us posted with updates. @Thomasdezeeuw I think we can close this issue since there's nothing else to be done on rust/mio side. |
I prefer to keep this open as its still an issue. In any case maybe we should document it and maybe share the info with other project such as libuv. |
@Thomasdezeeuw it's already documented Line 164 in 0a55a12
|
The behavior observed and investigated in this issue has been documented in Win docs. @MarkLivsc pointed us to the doc location:
Closing this issue as we've already had the proper usage documented:
|
@dtacalau and @MarkLivsc thanks for figuring this out! |
Problem:
When running the unit tests on Windows 8.1 Pro, tcp_stream ttl test crash in TcpStrea::set_ttl :
Cause:
According to @PerfectLaugh:
Solution:
?
The text was updated successfully, but these errors were encountered: