[ BUGFIX ] Fabrics connect timeouts #1711

tiagolobocastro · 2024-08-06T09:12:56Z

Reactor block_on may prevent spdk thread messages from running and therefore this can lead to starvation of messages pulled from the thread ring, which are not polled during the block_on.
There are still a few uses remaining, most during init setup, so mostly harmless, though the Nexus Bdev destruction which runs on blocking code, does still contain a block_on.

fix(nvmf/target): remove usage of block_on

Split creating from starting the subsystem.
This way we can start the subsystem in master reactor, and then move
to the next spdk subsystem.

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

fix(nexus-child/unplug): remove usage of block_on

Initially this block_on was added because the remove callback was running in blocking
fashion, but this has since changed and unplug is actually called from async context.
As such, we don't need the block_on and simply call the async code directly.
Also, simplify complete notification, as we can simply close the sender.

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

fix(nvmx/qpair): return errno with absolute value

Otherwise a returned negative value translates into an unknown Errno.

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

feat: allow custom fabrics connect timeout

Allows passing this via env NVMF_FABRICS_CONNECT_TIMEOUT.
Also defaults it to 3s for now, rather than 500ms.

Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

io-engine/src/bdev/nvmx/uri.rs

tiagolobocastro · 2024-08-06T09:18:54Z

bors try

bors-openebs-mayastor · 2024-08-06T10:01:25Z

try

Build failed:

continuous-integration/jenkins/branch

Allows passing this via env NVMF_FABRICS_CONNECT_TIMEOUT. Also defaults it to 1s for now, rather than 500ms. Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

Otherwise a returned negative value translates into an unknown Errno. Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

Initially this block_on was added because the remove callback was running in blocking fashion, but this has since changed and unplug is actually called from async context. As such, we don't need the block_on and simply call the async code directly. Also, simplify complete notification, as we can simply close the sender. Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

tiagolobocastro · 2024-08-06T16:15:09Z

bors try

bors-openebs-mayastor · 2024-08-06T17:07:15Z

try

Build succeeded:

continuous-integration/jenkins/branch

io-engine/src/subsys/nvmf/target.rs

dsharma-dc · 2024-08-09T04:39:38Z

Do we know the reason why historically these paths were using blocking on futures to complete?

Split creating from starting the subsystem. This way we can start the subsystem in master reactor, and then move to the next spdk subsystem. Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

tiagolobocastro · 2024-08-09T09:05:24Z

Do we know the reason why historically these paths were using blocking on futures to complete?

the unplug code was originally not async, and suspect that's why block_on was used.

The others I don't know, perhaps was just used as a "shotcut"...

tiagolobocastro · 2024-08-09T09:54:35Z

Resolves #1710
bors merge

1711: [ BUGFIX ] Fabrics connect timeouts r=tiagolobocastro a=tiagolobocastro Reactor block_on may prevent spdk thread messages from running and therefore this can lead to starvation of messages pulled from the thread ring, which are not polled during the block_on. There are still a few uses remaining, most during init setup, so mostly harmless, though the Nexus Bdev destruction which runs on blocking code, does still contain a block_on. --- fix(nvmf/target): remove usage of block_on Split creating from starting the subsystem. This way we can start the subsystem in master reactor, and then move to the next spdk subsystem. Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com> --- fix(nexus-child/unplug): remove usage of block_on Initially this block_on was added because the remove callback was running in blocking fashion, but this has since changed and unplug is actually called from async context. As such, we don't need the block_on and simply call the async code directly. Also, simplify complete notification, as we can simply close the sender. Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com> --- fix(nvmx/qpair): return errno with absolute value Otherwise a returned negative value translates into an unknown Errno. Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com> --- feat: allow custom fabrics connect timeout Allows passing this via env NVMF_FABRICS_CONNECT_TIMEOUT. Also defaults it to 3s for now, rather than 500ms. Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com> Co-authored-by: Tiago Castro <tiagolobocastro@gmail.com>

tiagolobocastro · 2024-08-09T09:55:20Z

oops typo
bors cancel

bors-openebs-mayastor · 2024-08-09T09:55:23Z

Canceled.

Should this become an unsafe function? Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

tiagolobocastro · 2024-08-09T09:57:29Z

bors merge

bors-openebs-mayastor · 2024-08-09T11:03:41Z

Build succeeded:

continuous-integration/jenkins/branch

tiagolobocastro requested review from dsavitskiy, abhilashshetty04 and dsharma-dc August 6, 2024 09:12

auto-assign bot requested a review from hrudaya21 August 6, 2024 09:16

tiagolobocastro commented Aug 6, 2024

View reviewed changes

io-engine/src/bdev/nvmx/uri.rs Outdated Show resolved Hide resolved

tiagolobocastro force-pushed the fabrics-connect branch from 182d2f7 to e0f2b9a Compare August 6, 2024 16:13

tiagolobocastro added 3 commits August 6, 2024 17:14

feat: allow custom fabrics connect timeout

b088f32

Allows passing this via env NVMF_FABRICS_CONNECT_TIMEOUT. Also defaults it to 1s for now, rather than 500ms. Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

fix(nvmx/qpair): return errno with absolute value

ec02339

Otherwise a returned negative value translates into an unknown Errno. Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

tiagolobocastro force-pushed the fabrics-connect branch from e0f2b9a to 8893313 Compare August 6, 2024 16:14

bors-openebs-mayastor bot pushed a commit that referenced this pull request Aug 6, 2024

Try #1711:

f8cb7f2

tiagolobocastro removed the request for review from hrudaya21 August 7, 2024 09:49

tiagolobocastro mentioned this pull request Aug 8, 2024

1 node down in a 3 node system caused many volumes to be Degraded/Faulted without a Target node #1714

Open

dsharma-dc approved these changes Aug 9, 2024

View reviewed changes

io-engine/src/subsys/nvmf/target.rs Outdated Show resolved Hide resolved

dsavitskiy approved these changes Aug 9, 2024

View reviewed changes

fix(nvmf/target): remove usage of block_on

d5b5d44

Split creating from starting the subsystem. This way we can start the subsystem in master reactor, and then move to the next spdk subsystem. Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

tiagolobocastro force-pushed the fabrics-connect branch from 8893313 to d5b5d44 Compare August 9, 2024 09:05

chore: add warning to block_on

d9e19b3

Should this become an unsafe function? Signed-off-by: Tiago Castro <tiagolobocastro@gmail.com>

tiagolobocastro force-pushed the fabrics-connect branch from 5c36253 to d9e19b3 Compare August 9, 2024 09:56

bors-openebs-mayastor bot merged commit 6939b15 into release/2.7 Aug 9, 2024
5 checks passed

bors-openebs-mayastor bot deleted the fabrics-connect branch August 9, 2024 11:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ BUGFIX ] Fabrics connect timeouts #1711

[ BUGFIX ] Fabrics connect timeouts #1711

tiagolobocastro commented Aug 6, 2024

tiagolobocastro commented Aug 6, 2024

bors-openebs-mayastor bot commented Aug 6, 2024

tiagolobocastro commented Aug 6, 2024

bors-openebs-mayastor bot commented Aug 6, 2024

dsharma-dc commented Aug 9, 2024

tiagolobocastro commented Aug 9, 2024

tiagolobocastro commented Aug 9, 2024

tiagolobocastro commented Aug 9, 2024

bors-openebs-mayastor bot commented Aug 9, 2024

tiagolobocastro commented Aug 9, 2024

bors-openebs-mayastor bot commented Aug 9, 2024

[ BUGFIX ] Fabrics connect timeouts #1711

[ BUGFIX ] Fabrics connect timeouts #1711

Conversation

tiagolobocastro commented Aug 6, 2024

tiagolobocastro commented Aug 6, 2024

bors-openebs-mayastor bot commented Aug 6, 2024

try

tiagolobocastro commented Aug 6, 2024

bors-openebs-mayastor bot commented Aug 6, 2024

try

dsharma-dc commented Aug 9, 2024

tiagolobocastro commented Aug 9, 2024

tiagolobocastro commented Aug 9, 2024

tiagolobocastro commented Aug 9, 2024

bors-openebs-mayastor bot commented Aug 9, 2024

tiagolobocastro commented Aug 9, 2024

bors-openebs-mayastor bot commented Aug 9, 2024