-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: increase the default for conn_retries #81
Conversation
The rpc client uses 0.2 sec sleeps between retries [1], so with the old default (3) it was waiting max 0.6 sec before giving up, and it was not always enough for the spdk target to launch and start listening the rpc socket. With the new default (10) it will have about 2 sec. It is still not ideal, as I might imagine cases where it would be still not enough (like a testing virtual environment with highly limited resources), but I am not sure we need a more complex solution. I don't want to add a sleep after spawning the spdk target, as it will unnecessary slow down "fast" startup. We could use our own loop, which could do retries smarter, like the sleep time growing after unsuccessful tries, but I am not sure we should bother with this. We might just want to increase the retries more if 10 is not enough. And as this is an internal thing I am not sure we want this config parma in the example config. I.e. we still may have this configurable but not show in the example, or have it commented, to show it is not necessary. [1] https://github.com/ceph/spdk/blob/47737f16b47522cbc41b5106c69598cd7ff1a76a/scripts/rpc/client.py#L54 |
Please add (most of) the #81 (comment) to the commit message -- at least the first paragraph and the first sentence of the second paragraph definitely deserve to be noted in the commit itself. |
@idryomov Thanks, updated. |
I somehow didn't post the rest of the review yesterday (it was sitting there in pending comments). Apologies for the churn. |
Currently the rpc client uses 0.2 sec sleeps between retries, so with the old default (3) it was waiting max 0.6 sec before giving up, and it was not always enough for the spdk target to launch and start listening the rpc socket. With the new default (10) it will have about 2 sec. It is not ideal, as there is still a chance for it to fail (like a testing virtual environment with highly limited resources), but we probably don't need a more complex solution right now. Fixes: ceph#43 Signed-off-by: Mykola Golub <mykola.golub@clyso.com>
@idryomov updated. thanks |
Fixes: #43