Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify SharedFd close logic #230

Closed
wants to merge 1 commit into from

Conversation

thomasbarrett
Copy link

@thomasbarrett thomasbarrett commented Feb 7, 2023

Currently, the SharedFd close logic is needlessly complex. The logic within SharedFd that prevents a Fd from being closed while any IO operations are in-flight is unnecessary because the rust borrow checker already enforces this.

@thomasbarrett
Copy link
Author

See this comment for an explanation of why this makes sense.

@FrankReh
Copy link
Collaborator

FrankReh commented Feb 7, 2023

See this comment for an explanation of why this makes sense.

Those examples might be clear to you, but they aren't to me. Do I really have to dig through the details of each to get the points you are making or is there a way you can highlight which lines I, or any reader, should focus on?

@FrankReh
Copy link
Collaborator

FrankReh commented Feb 7, 2023

I just got confirmation from the liburing team that it is racy to close a fd while in-flight operations may be working on it. They call it lazy file reference and I take that to mean they don't increment the reference when the SQE is first read, but only later, when they get around to start doing the work. axboe/liburing#782 (comment)

So our blocking safeguard on the close makes some sense. When we are trying to keep the user safe from themselves.

And I think

Currently, the SharedFd close logic is needlessly complex. The logic within SharedFd that prevents a Fd from being closed while any IO operations are in-flight is unnecessary because the rust borrow checker already enforces this.

is plan wrong. The rust borrow checker does not help us with tracking when operations have actually completed.

There are a number of ways a Future can be started and not waited for. And a File can be shared between tasks on the same thread, one for reading, one for writing, as an example. Either can call close. Is it a programming error to close while another operation may still be operating on it? Probably yes. But the compiler doesn't help us avoid that. The use of reference counts does.

But if some user wants to go into unsafe territory, I don't think this crate should block them. The question becomes, what would make for an elegant unsafe API to offer in addition the API that is developing for safe/easy use?

@thomasbarrett
Copy link
Author

thomasbarrett commented Feb 7, 2023

@FrankReh I should clarify my thoughts, but please correct me if I am saying something completely wrong. The phenomenon that I was trying to show with my examples is that the public facing API that tokio_uring exposes to users is the File type (and similar). As far as I can tell, the only time that a File (and similar) actually clones its SharedFd is in the (read_at, write_at, etc) operations. In all of these cases, we await the operation. Thus, the lifetime of the SharedFd clones is that same as the lifetime of the reference to &File held by the async method. In safe rust, it is impossible for library users to call File::close while an operation is still in progress (regardless of whether or not they await the result of the operation). So, the race condition:

it is racy to close a fd while in-flight operations may be working on it

should be impossible for library users to hit without unsafe code.

@thomasbarrett
Copy link
Author

I just got confirmation from the liburing team that it is racy to close a fd while in-flight operations may be working on it. They call it lazy file reference and I take that to mean they don't increment the reference when the SQE is first read, but only later, when they get around to start doing the work. axboe/liburing#782 (comment)

Thanks for reaching out btw. This is very useful to know :). Since the current implementation doesn't really impede what we are working on (fixed fds, into_raw_fd, uring_cmd, etc), l'm fine to close this and focus on other PRs.

@FrankReh
Copy link
Collaborator

FrankReh commented Feb 7, 2023

should be impossible for library users to hit without unsafe code.

@thomasbarrett That is not correct. Safe code allows starting a future and then not waiting for it to complete, see the select! macro.

Also, I have been able to make copies of a File by wrapping in Rc<RefCell<_>> so that two tasks could operation on the same File. Both task running on the same thread. One task could be waiting for an operation of the File and the other could be trying to close it.

@thomasbarrett
Copy link
Author

Also, I have been able to make copies of a File by wrapping in Rc<RefCell<_>> so that two tasks could operation on the same File. Both task running on the same thread. One task could be waiting for an operation of the File and the other could be trying to close it.

@FrankReh, while that is true for most operations, it does not apply to close. Close is special in that it takes ownership of the file. The rules of safe rust ensure that if close is called, there must not be any references (mutable or immutable) to File still in scope, regardless of whether or not those references are in another thread, another task in the same thread, in an rc, in a refcell, etc. If you have a counter-example, please share it.

    pub async fn close(self) -> io::Result<()> {
        self.fd.close().await
    }

@FrankReh
Copy link
Collaborator

FrankReh commented Feb 7, 2023

@thomasbarrett You are correct. I was incorrect in thinking there was a way to get closed called and not resort to unsafe code. Thought I knew that better but I haven't used an explicit close from rust yet.

So is it enough that futures can be dropped while an operation is outstanding or is there more worth investigating?

@thomasbarrett
Copy link
Author

So is it enough that futures can be dropped while an operation is outstanding or is there more worth investigating?

Awesome! Glad we are on the same page :) I spent a lot of time scratching my head over this one :P. I believe that this is all that needs to be done. I ran the CI pipeline on my fork, and all tests pass. Running this branch on my own project w/ more complex setups and a lot of logging also didn't raise any alarms.

@thomasbarrett
Copy link
Author

Hmm... there is more worth investigation. Notably, this implementation still suffers the same issue as the current implementation. When a future that is in-flight is dropped (cancelling), it isn't synchronously deleted. It still holds a reference to a SharedFd until the "cancelled" operation completes. See this issue for an example.

On this branch, this behavior results in panic!. On the current implementation, it causes the SharedFd implementation to "hang"

@thomasbarrett
Copy link
Author

Closing this PR to reduce noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants