-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use-after free in Op::connect
#125
Comments
Note that the tokio-uring/src/driver/accept.rs Lines 15 to 18 in 1c0a2bf
I still don't think that's enough, but there's much fewer chances of that heap-allocated memory being overwritten by the time the kernel gets to it. |
Ah, the accept situation is a bit different since it's an out pointer. I see what holds the data now - the So, Boxing the address would fix the connect thing for good, but I still feel like having the "future handle" hold associated data is the wrong model. If one were to drop the connect/accept future before the operation completes, the kernel would be reading from / writing to invalid memory - same lifetime concerns as buffers. |
Is it the accept op or the connection op that you think is dropping the fd too early? I don't know all the details of the latest Completable design but I hadn't had problems with the earlier one. I thought the code paths kept the fd in a SharedFd that only got dropped when the Op was dropped and the operation had completed per a cqe entry. Oh, not the fd, the address? The Connect struct in connect.rs takes ownership of the SockAddr. How is that being dropped before the Op is dropped? I don't see unsafe being used anywhere around the connect.socket_addr so I'm wondering about the address of the sock address being passed to the kernel. |
Exactly. As things stand now, both accept and connect are unsafe:
That model is flawed imho - both the inputs AND the outputs must be kept in the slab so they never move until the operation is successfully completed OR successfully cancelled. The future handle must be just that - a handle. |
What version of kernel are you testing on? For sometime, the kernel driver folks have said the data only needs to be good |
Confused by this. The kernel doesn't write to any of our data, it returns a result and some flags bits. Do you mean you think the kernel has kept the original SockAddr pointers during its accept? I didn't see that documented anywhere in their man pages but I've never tried looking at their kernel code. |
This is on:
Submissions are batched: if the queue isn't full,
That is what I think is happening, yes - we're giving the kernel a place to store the local_addr for the accepted connection: |
Accept is fine because on drop the state gets dropped into the driver (Lifecycle::Ignored). |
That's true! That means this issue was closed by #126. |
I was wondering why connecting to IPv4 addresses sometimes worked.. but IPv6 doesn't. Sometimes I got an errno 97 (address family not supported), sometimes it was just hanging... and adding a couple
dbg!
statements made it work?All of this smells a lot like memory unsafety.. and it is!
Here, a
Connect
struct is created, moving aSockAddr
into it:tokio-uring/src/driver/connect.rs
Lines 17 to 30 in 1c0a2bf
Then, in
submit_with
, the data is moved into anOp
struct, the callback is called to actually configure the SQE from the given data...tokio-uring/src/driver/op.rs
Lines 74 to 78 in 1c0a2bf
And then
op
is dropped (and the data with it). We now have an op in the submission queue that refers to some userspace address that's been freed (somewhere on the stack, presumably). This probably works well in tests because immediately after attempting to connect, it sleeps. But we now have the kernel using some memory after we've freed it.A potential fix would be to move the data into the
Lifecycle::Submitted
variant - but that of course raises the question of: should it be boxed? Probably not right? So there's only so much data we can stuff in there - maybeconnect
is one of the rare ops that needs something like this, so maybe it should itself contain an enum?At any rate, that explains the random failures I've been getting.
The text was updated successfully, but these errors were encountered: