-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
driver: MultiCqe support for Driver Drop #164
driver: MultiCqe support for Driver Drop #164
Conversation
@Noah-Kennedy @FrankReh Could I get a review? I'm keen to unblock #123 |
I've another commit for this, in progress |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it. Wish it were testable because it is getting just complicated enough now that we might be overlooking something. The rust compiler and our ability to reason through the logic makes me pretty sure this does what is intended.
@FrankReh : I've made an update optimizing the drop logic
|
I do wonder about both this and the original though. How do we know we've awaited all the AsyncCancel's we've pushed? I don't think either the oringal or this handle that properly |
It doesn't need to wait for the AsyncCancel's to be done. It uses a loop at the bottom to wait for the original's to move to complete I believe. I'll double check that now. |
Odd, this page. My approval shouldn't be counted if another change comes in. |
Looking at the wrong section |
@FrankReh thats a repo setting. |
@FrankReh I'm happy this and the original awaited all the ops in the slab. I just don't know it its safe to drop the rings if there are in-fligh AsyncCancels, which we don't account for |
@FrankReh , I'm going to have 1 more edit. I think we can do the wait logic in a single pass of the slab |
Ignore that - its too big a change. I'm happy with this PR now. I will not be pressing merge, i'll leave that to @FrankReh Interesting - thats gone from being possible to me being not on the list - which is what I expected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I like this version as much. Do I misunderstand what's going on in that final loop?
@FrankReh Cancel is called exactly once for each outstanding op. The logic is pretty much the same as the orignal, only without a vec
|
I've never read a warning in the liburing stuff about having to wait for quiescence before closing a uring fd. I believe we should proceed like it is safe. If you want, I or one of us can pose the question to the liburing author as an issue on their git repo but I really think the answer is the kernel protects itself from the uring fd being closed and doesn't need the user land to wait for things to be cleaned up. So to be clear, I think it's a good idea we make the drop wait to see the active operations completed but we don't need to wait to see our async cancels themselves being completed. It's only against the kernel modifying userland data that we are trying to protect ourselves but async cancels don't use userland buffers. |
@FrankReh The number of defunct processes I've had whilst doing dev this week say otherwise |
Fair enough. Let me look through it again. |
Although, maybe AsyncCanels are ok. I don't know. I've been ungracefully exiting with send_zc buffers in flight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced we should use a want=1 when waiting. The io-uring crate may provide a wait with timeout however. Or a scheme that just busy waits seems safer.
@ollie-etl I don't think you should be trying to get this mergable today. I'm assuming you want it in the release but if a release goes out this weekend like Noah wondered about, I think we should have a follow up release very soon after that with the PRs that have been consuming so much of our time over the last two weeks. I wouldn't even try to get any of my work into the follow on. It would be purely to catch up to with the current PRs that were ready. |
@FrankReh I think I've addressed the quadratic behaviour |
@frank: Now I've seen the path to a single traversal, what do you think about refactoring the whole lot from 3 passes to 1? That would imply we wait for each op to complete or be cancelled before moving onto the next. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approve but I want @Noah-Kennedy to have a look and approve too. This is significant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hate to ask for comment fixes when Noah hasn't reviewed yet. But the comments in the right place might make it easier for him to reason through too.
@ollie-etl Yes, easy to look in the wrong section. So for @Noah-Kennedy 's benefit, we no longer think this fixes his earlier version. This just makes it possible to handle CompletionList entries too. But I probably like the last loop here better, even without counting the handling of CompletionList being included here. |
@FrankReh Comments addressed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see this go in if @Noah-Kennedy finds it passes all the stress tests he had run before. I would expect this to be tweaked down the road, at least as we look for a way that avoids a possible wait hang with want=1. But maybe not before what could be today's release.
@ollie-etl please run |
@Noah-Kennedy Done. |
@Noah-Kennedy I think this needs an approval from you, again it shows as blocked on approval. I guess the old one is stale? |
Handles
LifeCycle::CompletionList
in driver Drop logicDuring the first pass over all lifecycles, identifies if CompletionLists are finished or not.
This means that after this pass, we will never observe
CompletionList
in the lifecycles. This means the existing termination logic is ok unchanged. The ignored entry will stay until removed from the Lifecycle slab by the call to complete.Fixes a gap left by #158 and unblocks #123