Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ BUGFIX ] Fabrics connect timeouts #1711

Merged
merged 5 commits into from
Aug 9, 2024

Conversation

tiagolobocastro
Copy link
Contributor

Reactor block_on may prevent spdk thread messages from running and therefore this can lead to starvation of messages pulled from the thread ring, which are not polled during the block_on.
There are still a few uses remaining, most during init setup, so mostly harmless, though the Nexus Bdev destruction which runs on blocking code, does still contain a block_on.


fix(nvmf/target): remove usage of block_on

Split creating from starting the subsystem.
This way we can start the subsystem in master reactor, and then move
to the next spdk subsystem.

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

fix(nexus-child/unplug): remove usage of block_on

Initially this block_on was added because the remove callback was running in blocking
fashion, but this has since changed and unplug is actually called from async context.
As such, we don't need the block_on and simply call the async code directly.
Also, simplify complete notification, as we can simply close the sender.

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

fix(nvmx/qpair): return errno with absolute value

Otherwise a returned negative value translates into an unknown Errno.

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

feat: allow custom fabrics connect timeout

Allows passing this via env NVMF_FABRICS_CONNECT_TIMEOUT.
Also defaults it to 3s for now, rather than 500ms.

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

@tiagolobocastro
Copy link
Contributor Author

bors try

@bors-openebs-mayastor
Copy link

try

Build failed:

Allows passing this via env NVMF_FABRICS_CONNECT_TIMEOUT.
Also defaults it to 1s for now, rather than 500ms.

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>
Otherwise a returned negative value translates into an unknown Errno.

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>
Initially this block_on was added because the remove callback was running in blocking
fashion, but this has since changed and unplug is actually called from async context.
As such, we don't need the block_on and simply call the async code directly.
Also, simplify complete notification, as we can simply close the sender.

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>
@tiagolobocastro
Copy link
Contributor Author

bors try

bors-openebs-mayastor bot pushed a commit that referenced this pull request Aug 6, 2024
@bors-openebs-mayastor
Copy link

try

Build succeeded:

io-engine/src/subsys/nvmf/target.rs Outdated Show resolved Hide resolved
@dsharma-dc
Copy link
Contributor

Do we know the reason why historically these paths were using blocking on futures to complete?

Split creating from starting the subsystem.
This way we can start the subsystem in master reactor, and then move
to the next spdk subsystem.

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>
@tiagolobocastro
Copy link
Contributor Author

Do we know the reason why historically these paths were using blocking on futures to complete?

the unplug code was originally not async, and suspect that's why block_on was used.

The others I don't know, perhaps was just used as a "shotcut"...

@tiagolobocastro
Copy link
Contributor Author

Resolves #1710
bors merge

bors-openebs-mayastor bot pushed a commit that referenced this pull request Aug 9, 2024
1711: [ BUGFIX ] Fabrics connect timeouts r=tiagolobocastro a=tiagolobocastro

Reactor block_on may prevent spdk thread messages from running and therefore this can lead to starvation of messages pulled from the thread ring, which are not polled during the block_on.
There are still a few uses remaining, most during init setup, so mostly harmless, though the Nexus Bdev destruction which runs on blocking code, does still contain a block_on.

---

    fix(nvmf/target): remove usage of block_on
    
    Split creating from starting the subsystem.
    This way we can start the subsystem in master reactor, and then move
    to the next spdk subsystem.
    
    Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

---

    fix(nexus-child/unplug): remove usage of block_on
    
    Initially this block_on was added because the remove callback was running in blocking
    fashion, but this has since changed and unplug is actually called from async context.
    As such, we don't need the block_on and simply call the async code directly.
    Also, simplify complete notification, as we can simply close the sender.
    
    Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

---

    fix(nvmx/qpair): return errno with absolute value
    
    Otherwise a returned negative value translates into an unknown Errno.
    
    Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

---

    feat: allow custom fabrics connect timeout
    
    Allows passing this via env NVMF_FABRICS_CONNECT_TIMEOUT.
    Also defaults it to 3s for now, rather than 500ms.
    
    Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

Co-authored-by: Tiago Castro <tiagolobocastro@gmail.com>
@tiagolobocastro
Copy link
Contributor Author

oops typo
bors cancel

@bors-openebs-mayastor
Copy link

Canceled.

Should this become an unsafe function?

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>
@tiagolobocastro
Copy link
Contributor Author

bors merge

@bors-openebs-mayastor
Copy link

Build succeeded:

@bors-openebs-mayastor bors-openebs-mayastor bot merged commit 6939b15 into release/2.7 Aug 9, 2024
5 checks passed
@bors-openebs-mayastor bors-openebs-mayastor bot deleted the fabrics-connect branch August 9, 2024 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants