Retry when it could not get the pull request #101

asmeurer · 2012-03-11T23:00:10Z

This also changes all retries to use a doubled waiting scheme, where it waits
for 1 second, then 2 seconds, then 4 seconds, then 8 seconds, and so on. This
is for example what GMail does. This prevents us from DOSing the server.

The easiest way to test this is to disconnect yourself from the internet and
try to review a pull request.

Closes #96.

This also changes all retries to use a doubled waiting scheme, where it waits for 1 second, then 2 seconds, then 4 seconds, then 8 seconds, and so on. This is for example what GMail does. This prevents us from DOSing the server. The easiest way to test this is to disconnect yourself from the internet and try to review a pull request. Closes sympy#96.

MichaelMayorov · 2012-03-13T11:51:51Z

What point of using doubling?
I think timeout should be optional and defined by user(i.e. slow internet connection via GPRS at least).
Maybe choose default timeout(2-3 seconds) and let user to change it with option like --timeout

MichaelMayorov · 2012-03-13T11:54:55Z

Count of tries will be useful also. Noone will want to wait for several minutes between tries.

asmeurer · 2012-03-13T19:57:41Z

If you get to the point where it's waiting more than a minute between tries, it's likely that the server is down, or there is some other problem (internet connection). If that happens, you should correct the problem, and restart. This is designed to prevent DOSing the system when run unmanned. If you are manning the system and see that there's a problem, you should step in and fix it, or if it cannot be fixed (such as if the server is down), pull the bot offline until it is fixed.

A timeout of 2-3 seconds has the potential to DOS the server. Imagine there is some bug or typo somewhere that causes it to run the loop endlessly. If the system is being run unmanned, then in one hour, with a three second timeout it will ping the server 60_60/3 = 1200 times. With this system, it will ping the system no more than log(60_60, 2) ~= 11 times. That's for one hour. If the system is run for 24 hours unmanned, the first number goes up to 60_60_24/3 = 28800 times, and the second becomes log(60_60_24, 2) ~= 16 times.

You might say that we should just make the timeout longer. But this doesn't really solve the problem. First off, it will still be a ton more pings than the doubling approach. Second, if we make the timeout longer than 5 or 10 seconds, it becomes an inconvenience to the user in the common case, where it just had an intermittent failure and will work the next time or the time after that. This way, if there is an intermittent failure, it will try again in 1 second, and again after that in 2 seconds. Depending on how intermittent the failure is, it will likely work at that point (the scheme is adaptive to different levels of intermittentcy, so even longer term failures like something that lasts 10 seconds will be handled reasonably, without DOSing the system, which can cause the failure to last even longer in some cases).

MichaelMayorov · 2012-03-14T05:37:26Z

OK, I get your point.

But how administrator will know that sympy-bot can't get pull requests? btw, if server going down for a long time(half of day or more) then timeout between tries could be more then 1 hour.

Besides that, it's a cron job. I don't know how GAE respond on running same update script in 2 or more cron jobs(I mean case when first cron job just started and didn't finish before second job starts).
But anyway, administrator doesn't know about problem until he'll see the problem itself.

So, I suggest two things. First, use counter of tries. So, if server don't respond for a long time then generate fail message and stop. Second, if that happens then application should forward this message to administrator(for example via email).

P.S. also, may be I just didn't find that, but sympy-bot has notification system? If not then may include that as idea in my GSoC proposal?

asmeurer · 2012-03-14T08:13:41Z

But how administrator will know that sympy-bot can't get pull requests? btw, if server going down for a long time(half of day or more) then timeout between tries could be more then 1 hour.

Yeah, that's expected. Remember that if this happens, then it means that the bot itself is running unmanned (or else someone would have noticed that it was just sitting there doing nothing). The best we can do in this case is make it not DOS the server, but occasionally try it in case it goes up again (again unmanned). For now, unmanned doesn't happen much, but when we implement sympy-bot work (#63) it will.

I don't understand what you are saying about cron jobs. What are you saying will be a cron job?

We don't have any kind of notifications, other than the comments to github. If you think they will be useful, feel free to open an issue for it and include it in your proposal. I suppose it could be useful, e.g., if you have sympy-bot work running on some old machine or server somewhere, to have it send notifications if it errors, or maybe a heartbeat so we can easily see if it goes down. Does the app engine have APIs for this sort of thing? If so, we could just do everything through the reviews site.

Anyway, how the admin knows about it is a different problem from this pull request. For now, we just implement the basic low-tech solution of printing a message to the terminal every time it retries. The main thing here is to prevent high amounts of pinging (DOSing), and, more importantly, to catch the URLError that has been annoying me for the past few weeks.

MichaelMayorov · 2012-03-14T08:34:23Z

I don't understand what you are saying about cron jobs. What are you saying will be a cron job?

I meant http://reviews.sympy.org/update (will run every 12 hours) and http://reviews.sympy.org/quickupdate (will run every 2 hours), as I understand theese scripts perfrom update of pull requests list.

certik · 2012-03-14T19:13:44Z

utils.py

-            time.sleep(2)
+            print "Error while accessing pastehtml.com, retrying in %d seconds..." % timer
+            time.sleep(timer)
+            timer *= 2


Yep, this is cool. Except that we don't use pastehtml anymore, right?

No, but I figured that since this is still in here, I might as well update it. Maybe it was kept around incase we ever need to fall back to it, or something similar?

certik · 2012-03-14T19:16:29Z

I am +1 for this patch. If this works for you, feel free to merge it.

asmeurer · 2012-03-15T05:08:37Z

I meant http://reviews.sympy.org/update (will run every 12 hours) and http://reviews.sympy.org/quickupdate (will run every 2 hours), as I understand theese scripts perfrom update of pull requests list.

No, these changes are only for sympy-bot itself, which runs locally. The webapp is only stuff in the web directory, which I did not touch here.

asmeurer · 2012-03-15T05:09:14Z

I am +1 for this patch. If this works for you, feel free to merge it.

Thanks. I'm doing so now, since I can't really use SymPy-Bot except in this branch.

Retry when it could not get the pull request

certik reviewed Mar 14, 2012
View reviewed changes

asmeurer added a commit that referenced this pull request Mar 15, 2012

Merge pull request #101 from asmeurer/urlerror

54a7485

Retry when it could not get the pull request

asmeurer merged commit 54a7485 into sympy:master Mar 15, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry when it could not get the pull request #101

Retry when it could not get the pull request #101

asmeurer commented Mar 11, 2012

MichaelMayorov commented Mar 13, 2012

MichaelMayorov commented Mar 13, 2012

asmeurer commented Mar 13, 2012

MichaelMayorov commented Mar 14, 2012

asmeurer commented Mar 14, 2012

MichaelMayorov commented Mar 14, 2012

certik Mar 14, 2012

asmeurer Mar 15, 2012

certik commented Mar 14, 2012

asmeurer commented Mar 15, 2012

asmeurer commented Mar 15, 2012

Retry when it could not get the pull request #101

Retry when it could not get the pull request #101

Conversation

asmeurer commented Mar 11, 2012

MichaelMayorov commented Mar 13, 2012

MichaelMayorov commented Mar 13, 2012

asmeurer commented Mar 13, 2012

MichaelMayorov commented Mar 14, 2012

asmeurer commented Mar 14, 2012

MichaelMayorov commented Mar 14, 2012

certik Mar 14, 2012

Choose a reason for hiding this comment

asmeurer Mar 15, 2012

Choose a reason for hiding this comment

certik commented Mar 14, 2012

asmeurer commented Mar 15, 2012

asmeurer commented Mar 15, 2012