Limit number of retries #194

tremby · 2016-09-22T06:01:15Z

When a task fails (like an unhandled exception) it is thrown in the failing jobs list but also retried. It doesn't seem there's any limit to the number of retries, and that means a job which is doomed to failure (due to my bad programming) continues to fail all day. I get a message sent to Rollbar each time, and that adds up to a lot of messages and eats my quota.

Is there a way to set a retry limit?

manuganji · 2017-02-03T20:05:45Z

Is there a work around for this?

DEKHTIARJonathan · 2017-04-14T18:32:01Z

Is there any update on this ? I have the exact same issue ...
@tremby @manuganji did you find a workaround ?

tremby · 2017-04-15T03:53:46Z

I honestly don't recall, but I don't think so. Sorry!

manuganji · 2017-04-16T10:39:40Z

No, I haven't found it. I just fixed that scenario where my task was failing and made sure that I covered all the scenarios that I could predict. An ugly workaround is to wrap your whole task code in a try except block and only throw known exceptions and absorb all others. Of course, this only works if you have limited task types.

Eagllus · 2017-04-18T05:47:16Z

One way I resolved this issue is with a function called retry
This is designed to work with iLO's that are unreliable with there response/connections.

def retry(func, *args, **kwargs):
    """
    a custom retry for setting iLO information
    """
    count = kwargs.get('count', 0)
    max_retries = kwargs.get('max_retries', 3)
    countdown = kwargs.get('countdown', 30)
    exc = kwargs.get('exc', BaseException)

    if count < max_retries:
        time.sleep(countdown)
        count += 1
        func(*args, count=count, max_retries=max_retries, countdown=countdown)
    else:
        raise exc

I used it like this

def set_host_power(ilo, values, **kwargs):
    try:
        return ilo.call_ilo('set_host_power_saver', values['host_power_saver'])
    except (IloCommunicationError, socket.timeout, socket.error) as exc:
        return retry(set_host_power, ilo, values, exc=exc, **kwargs)

This will make allow retry to retry it an max_retries (default: 3) before raising the exception.
You could instead return a state instead of raising the exception.

dangerski · 2017-04-26T23:24:35Z

If you are using AWS SQS, my workaround is to make a deadletter queue and then on your task queue enable "use redrive policy" which "Sends messages into a dead letter queue after exceeding the Maximum Receives." Then set the maximum receives to the amount of retries you want to allow.

pilgrim2go · 2017-06-01T10:22:53Z

@dangerski : I have the same problem but even if I configure AWS Queue to use redrive. It doesn't help.
Diving into code to see why.

If a task fails with an exception, it is retried until it succeeds. This is contrary to what is said in the documentation: under the "Architecture" section, heading "Broker" it says that even when a task errors, it's still considered a successful delivery. Failed tasks never get acknowledged however, thereby being retried after the timeout period. See also issues Koed00#238 and Koed00#194. This patch adds an option to acknowledge failures, thereby closing issue Koed00#238. Issue Koed00#194 would require some more work. The default of this option is set to `False`, thereby maintaining backwards compatibility.

mm-matthias · 2019-01-09T06:59:53Z

The ack_failure/ack_failures options do not work for tasks that fail due to timeout. After the worker is killed, tasks remain in the queue. Once the worker is reincarnated, it fetches the task, runs into timeout and the whole things starts all over again.

Waszker · 2020-03-17T17:52:47Z

@mm-matthias I have exactly the same problem. Have you found the solution for it? It there a way to specify number of retries before termination?

mm-matthias · 2020-03-17T18:12:57Z

@Waszker We are using celery/redis instead of django-q for more than a year, so I don't have a solution for this problem

Waszker · 2020-03-17T18:50:39Z

@Waszker We are using celery/redis instead of django-q for more than a year, so I don't have a solution for this problem

That'a a pity... Thanks for the answer though...

timomeara · 2020-08-11T17:17:37Z

i have a PR for a retry limit
#466
try it out :)

abhishek-compro · 2022-08-16T04:43:44Z

This can be closed

martinzugnoni mentioned this issue May 12, 2017

Avoid retrying failed tasks #238

Closed

Balletie mentioned this issue Mar 9, 2018

Add option for acknowledging failed tasks (globally and per-task) #298

Merged

stuhli mentioned this issue Dec 18, 2020

Prevent retry in case of failure (max_attempts) #495

Closed

tremby closed this as completed Aug 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limit number of retries #194

Limit number of retries #194

tremby commented Sep 22, 2016 •

edited

Loading

manuganji commented Feb 3, 2017

DEKHTIARJonathan commented Apr 14, 2017

tremby commented Apr 15, 2017

manuganji commented Apr 16, 2017

Eagllus commented Apr 18, 2017

dangerski commented Apr 26, 2017

pilgrim2go commented Jun 1, 2017

mm-matthias commented Jan 9, 2019

Waszker commented Mar 17, 2020

mm-matthias commented Mar 17, 2020

Waszker commented Mar 17, 2020

timomeara commented Aug 11, 2020

abhishek-compro commented Aug 16, 2022

Limit number of retries #194

Limit number of retries #194

Comments

tremby commented Sep 22, 2016 • edited Loading

manuganji commented Feb 3, 2017

DEKHTIARJonathan commented Apr 14, 2017

tremby commented Apr 15, 2017

manuganji commented Apr 16, 2017

Eagllus commented Apr 18, 2017

dangerski commented Apr 26, 2017

pilgrim2go commented Jun 1, 2017

mm-matthias commented Jan 9, 2019

Waszker commented Mar 17, 2020

mm-matthias commented Mar 17, 2020

Waszker commented Mar 17, 2020

timomeara commented Aug 11, 2020

abhishek-compro commented Aug 16, 2022

tremby commented Sep 22, 2016 •

edited

Loading