Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: increase the default for conn_retries #81

Merged
merged 1 commit into from
Mar 29, 2023
Merged

Conversation

trociny
Copy link
Contributor

@trociny trociny commented Mar 1, 2023

Fixes: #43

@trociny
Copy link
Contributor Author

trociny commented Mar 1, 2023

The rpc client uses 0.2 sec sleeps between retries [1], so with the old default (3) it was waiting max 0.6 sec before giving up, and it was not always enough for the spdk target to launch and start listening the rpc socket.

With the new default (10) it will have about 2 sec. It is still not ideal, as I might imagine cases where it would be still not enough (like a testing virtual environment with highly limited resources), but I am not sure we need a more complex solution. I don't want to add a sleep after spawning the spdk target, as it will unnecessary slow down "fast" startup.

We could use our own loop, which could do retries smarter, like the sleep time growing after unsuccessful tries, but I am not sure we should bother with this. We might just want to increase the retries more if 10 is not enough.

And as this is an internal thing I am not sure we want this config parma in the example config. I.e. we still may have this configurable but not show in the example, or have it commented, to show it is not necessary.

[1] https://github.com/ceph/spdk/blob/47737f16b47522cbc41b5106c69598cd7ff1a76a/scripts/rpc/client.py#L54

@idryomov
Copy link
Contributor

Please add (most of) the #81 (comment) to the commit message -- at least the first paragraph and the first sentence of the second paragraph definitely deserve to be noted in the commit itself.

@idryomov idryomov added this to the #2 milestone Mar 28, 2023
@trociny
Copy link
Contributor Author

trociny commented Mar 29, 2023

@idryomov Thanks, updated.

ceph-nvmeof.conf Outdated Show resolved Hide resolved
@idryomov
Copy link
Contributor

idryomov commented Mar 29, 2023

@idryomov Thanks, updated.

I somehow didn't post the rest of the review yesterday (it was sitting there in pending comments). Apologies for the churn.

Currently the rpc client uses 0.2 sec sleeps between retries, so with
the old default (3) it was waiting max 0.6 sec before giving up, and
it was not always enough for the spdk target to launch and start
listening the rpc socket.

With the new default (10) it will have about 2 sec. It is not ideal,
as there is still a chance for it to fail (like a testing virtual
environment with highly limited resources), but we probably don't need
a more complex solution right now.

Fixes: ceph#43
Signed-off-by: Mykola Golub <mykola.golub@clyso.com>
@trociny
Copy link
Contributor Author

trociny commented Mar 29, 2023

@idryomov updated. thanks

@idryomov idryomov merged commit 96e2ccd into ceph:devel Mar 29, 2023
@trociny trociny mentioned this pull request Aug 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Unable to start control server
2 participants