-
-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeout when trying to bootstrap or ssh spot request instances #372
Comments
It may just be a timing issue in terms of how/when things come up on AWS itself. I'm afraid I don't personally have much experience with spot instances though. You might consider decomposing the bootstrap method and adding a larger delay before it tries to check |
@geemus Thanks for the suggestion. I had already decomposed I can open a PR with that change, but I suspect the few extra second are likely going to be different for different cloud providers, instance types, zones, and other factors. Would it make sense to use an exponential decay for that call? |
@jeremywadsack thanks for taking the time to dig-in. I imagine I set it at 10 because it was easy and seemed good-enough at the time. I would tend to agree that it is brittle though (especially across providers). Exponential backoff sounds good, as long as we can keep the code for it from getting to complex. Would you be up for taking a pass at that? Thanks! |
Happy to take a stab at a Pull Request.
You had it set at 8 by the way, which makes me think some kind of analysis.
I found 11 to be the limit it my testing but suspect it would cause the
same problem later without exponential back off.
On Wed, Jul 12, 2017 at 7:09 AM Wesley Beary ***@***.***> wrote:
@jeremywadsack <https://github.com/jeremywadsack> thanks for taking the
time to dig-in. I imagine I set it at 10 because it was easy and seemed
good-enough at the time. I would tend to agree that it is brittle though
(especially across providers). Exponential backoff sounds good, as long as
we can keep the code for it from getting to complex. Would you be up for
taking a pass at that? Thanks!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#372 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAOurCOTUYEQhlSGIeRn8lyMaCNwAHV1ks5sNNN7gaJpZM4OI08i>
.
--
Jeremy Wadsack
|
Sure, sounds good. |
… not respond in 8s Issue fog/fog-aws#372 For AWS Spot Request, instances never complete the `#setup` process eventually timing out through the `#wait_for` block because the default 8s is apparently not long enough to get the ssh connection. Through testing 11s seemed to work, but this is likely highly variable for regions, instance types, images, and providers. This change implements a 1.5 increase in timeout each time #sshable? is called, capped at 60s. If a successful connection is made, then the timeout is reset to the initial value.
@geemus Opened a PR with a solution for fixing that. Let me know if this makes sense, any concerns, etc. |
@jeremywadsack thanks, it's looking pretty good (sorry for the delay, I was on vacation last week). I'm going to go ahead and close this before I forget/lose it. I'm pretty confident we'll get your PR (or something quite like it) in soon. Thanks! |
I'm running into an issue where I can launch an instance and configure it without a problem. If I try to do the same with a spot request instance it hangs and eventually times out.
Calling
launch(...)
raisesFog::Errors::TimeoutError
from thebootstrap
call.The problem appears to be that it's waiting to be
sshable?
which never returnstrue
so it times out.I dug into that method and logged the errors and it appears to timeout on the connection:
While it's retrying AWS shows the instance as ready and I'm able to ssh into the instance using the credentials.
If I change this to
servers.bootstrap
then is works just fine, even though thesshable?
call is the same for both.I'm running
fog-aws
1.4.0.I'm new to spot instances so it's possible that I've configured something wrong, but I'm not sure what I need to change. Looking for guidance or anything I can do to help resolve this or help provide a fix.
The text was updated successfully, but these errors were encountered: