Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a @retry decorator to automatically retry Python functions in Checkbox jobs (new) #1453

Merged
merged 6 commits into from
Sep 11, 2024

Conversation

pieqq
Copy link
Collaborator

@pieqq pieqq commented Sep 4, 2024

Description

This PR does two things:

  • Implement a @retry decorator that can be used on any Python functions to automatically re-run it several times with an added delay/backoff/jitter when it raises an exception
  • Modify the networking_http.py script to make use of this

More info about the decorator:

When a function is decorated with @retry, it will be retried until it
passes or until the maximum defined attempts have been reached:

>>> from checkbox_support.helpers.retry import retry
>>> @retry(max_attempts=3, delay=10)
... def my_failing_test(arg):
...     return 1 / arg
... 
>>> result = my_failing_test(0)

===========
Attempt 1/3
===========
Attempt 1 failed:
division by zero

Waiting 3.61 seconds before retrying...

===========
Attempt 2/3
===========
Attempt 2 failed:
division by zero

Waiting 6.63 seconds before retrying...

===========
Attempt 3/3
===========
Attempt 3 failed:
division by zero

All the attempts have failed!
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/pieq/dev/work/checkbox/checkbox-support/checkbox_support/helpers/retry.py", line 89, in _f
    return run_with_retry(f, max_attempts, delay, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/pieq/dev/work/checkbox/checkbox-support/checkbox_support/helpers/retry.py", line 56, in run_with_retry
    result = f(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^
  File "<stdin>", line 3, in my_failing_test
ZeroDivisionError: division by zero
>>> @retry(max_attempts=3, delay=10)
... def my_passing_test(arg):
...     return 10 * arg
... 

>>> result = my_passing_test(5)

===========
Attempt 1/3
===========
>>> result
50

Resolved issues

https://warthogs.atlassian.net/browse/CHECKBOX-1543

Note: This PR implements the decorator, but it does not modify the WiFi tests to make use of it. This will be done in a separate PR.

Documentation

Documented in the commit messages. Since this is part of checkbox-support, it is not documented in the readthedocs documentation.

Tests

Unit tests are provided, and I also ran it on the networking/http job in Checkbox:

The job works as before when everything is OK (it just adds an "Attempt 1/x" at the beginning of the outcome):

$ TRANSFER_SERVER="ubuntu.com" checkbox-cli run com.canonical.certification::networking/http
===========================[ Running Selected Jobs ]============================
=========[ Running job 1 / 1. Estimated time left (at least): 0:00:00 ]=========
-----------[ Ensure downloading files through HTTP works correctly. ]-----------
ID: com.canonical.certification::networking/http
Category: com.canonical.plainbox::networking
... 8< -------------------------------------------------------------------------
URL transformed to HTTPS due to an HSTS policy
--2024-09-04 16:53:57--  https://ubuntu.com/
Resolving ubuntu.com (ubuntu.com)... 185.125.190.29, 185.125.190.21, 185.125.190.20, ...
Connecting to ubuntu.com (ubuntu.com)|185.125.190.29|:443... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 200 
  server: nginx/1.14.0 (Ubuntu)
  date: Wed, 04 Sep 2024 14:53:58 GMT
  content-type: text/html; charset=utf-8
  content-length: 119652
  x-view-name: canonicalwebteam.templatefinder.templatefinder.template_finder
  x-clacks-overhead: GNU Terry Pratchett
  permissions-policy: interest-cohort=()
  cache-control: max-age=60, stale-while-revalidate=86400, stale-if-error=300
  x-content-type-options: NOSNIFF
  x-vcs-revision: 1725458684-6a26498
  x-request-id: ab5646487d3efa085b92baabd9097b69
  strict-transport-security: max-age=15724800
  link: <https://assets.ubuntu.com>; rel=preconnect; crossorigin, <https://assets.ubuntu.com>; rel=preconnect, <https://res.cloudinary.com>; rel=preconnect
  x-cache-status: HIT from content-cache-il3/2
  accept-ranges: bytes
Length: 119652 (117K) [text/html]
Saving to: ‘/dev/null’

     0K .......... .......... .......... .......... .......... 42%  391K 0s
    50K .......... .......... .......... .......... .......... 85% 1.17M 0s
   100K .......... ......                                     100% 2.06M=0.2s

2024-09-04 16:53:58 (658 KB/s) - ‘/dev/null’ saved [119652/119652]

Attempt 1/5
===========

------------------------------------------------------------------------- >8 ---
Outcome: job passed
Finalizing session that hasn't been submitted anywhere: checkbox-run-2024-09-04T14.53.51
==================================[ Results ]===================================
 ☑ : Ensure downloading files through HTTP works correctly.

And it tries 5 times with a moving delay when failing (here, the HTTP URL to use has been left empty, so wget fails with an Invalid host name error):

$ checkbox-cli run com.canonical.certification::networking/http
===========================[ Running Selected Jobs ]============================
=========[ Running job 1 / 1. Estimated time left (at least): 0:00:00 ]=========
-----------[ Ensure downloading files through HTTP works correctly. ]-----------
ID: com.canonical.certification::networking/http
Category: com.canonical.plainbox::networking
... 8< -------------------------------------------------------------------------
http://: Invalid host name.
http://: Invalid host name.
http://: Invalid host name.
http://: Invalid host name.
http://: Invalid host name.

===========
Attempt 1/5
===========
Attempt 1 failed:
Command '['wget', '-SO', '/dev/null', 'http://']' returned non-zero exit status 1.

Waiting 20.92 seconds before retrying...

===========
Attempt 2/5
===========
Attempt 2 failed:
Command '['wget', '-SO', '/dev/null', 'http://']' returned non-zero exit status 1.

Waiting 27.41 seconds before retrying...

===========
Attempt 3/5
===========
Attempt 3 failed:
Command '['wget', '-SO', '/dev/null', 'http://']' returned non-zero exit status 1.

Waiting 27.43 seconds before retrying...

===========
Attempt 4/5
===========
Attempt 4 failed:
Command '['wget', '-SO', '/dev/null', 'http://']' returned non-zero exit status 1.

Waiting 44.94 seconds before retrying...

===========
Attempt 5/5
===========
Attempt 5 failed:
Command '['wget', '-SO', '/dev/null', 'http://']' returned non-zero exit status 1.

All the attempts have failed!
Command '['wget', '-SO', '/dev/null', 'http://']' returned non-zero exit status 1.
------------------------------------------------------------------------- >8 ---
Outcome: job failed
Finalizing session that hasn't been submitted anywhere: checkbox-run-2024-09-11T07.50.18
==================================[ Results ]===================================
 ☒ : Ensure downloading files through HTTP works correctly.

When a function is decorated with @Retry, it will be retried until it
passes or until the maximum defined attempts have been reached:

    >>> from checkbox_support.helpers.retry import retry
    >>> result my_failing_test()
    KeyboardInterrupt
    >>> @Retry(max_attempts=2, max_delay=10)
    ... def my_failing_test(arg):
    ...     return 1 / arg
    ...
    >>> result = my_failing_test(0)
    Attempt 1/2
    ===========
    Attempt 1 failed:
    division by zero

    Waiting 2.22 seconds before retrying...

    Attempt 2/2
    ===========
    Attempt 2 failed:
    division by zero

    All the attempts have failed!

    >>> @Retry(max_attempts=2, max_delay=10)
    ... def my_passing_test(arg):
    ...     return 1 * arg
    ...
    >>> result = my_passing_test(0)
    Attempt 1/2
    ===========

    >>> result
    0
Copy link

codecov bot commented Sep 4, 2024

Codecov Report

Attention: Patch coverage is 94.11765% with 3 lines in your changes missing coverage. Please review.

Project coverage is 46.23%. Comparing base (0881101) to head (73481d7).
Report is 126 commits behind head on main.

Files with missing lines Patch % Lines
providers/base/bin/networking_http.py 75.00% 2 Missing ⚠️
checkbox-support/checkbox_support/helpers/retry.py 97.67% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1453      +/-   ##
==========================================
+ Coverage   45.87%   46.23%   +0.36%     
==========================================
  Files         367      368       +1     
  Lines       39156    39243      +87     
  Branches     6622     6636      +14     
==========================================
+ Hits        17961    18144     +183     
+ Misses      20504    20401     -103     
- Partials      691      698       +7     
Flag Coverage Δ
checkbox-support 60.29% <97.67%> (+0.77%) ⬆️
provider-base 20.59% <75.00%> (+0.67%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@Hook25 Hook25 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done! I like this
A few changes, major I would like to see is the patch, it makes testing this easier (mocking decorators is tricky). Also, please don't change the exception, I see no upside to that tbh.

checkbox-support/checkbox_support/helpers/retry.py Outdated Show resolved Hide resolved
checkbox-support/checkbox_support/helpers/retry.py Outdated Show resolved Hide resolved
checkbox-support/checkbox_support/tests/test_retry.py Outdated Show resolved Hide resolved
providers/base/tests/test_networking_http.py Outdated Show resolved Hide resolved
- Raise the encountered exception (instead of SystemExit) when all the
attempts have been tried
- Separate the actual retry code into its own function,
`run_with_retry()`: this allows the retry command to be used without a
decorator
- Remove initial delay and backoff factor from function signatures:
these are not modified often in practice, whereas max_attempts and delay
are more useful
- Make the decorator arguments mandatory: instead of using default
values, let the test author decide how many attempts and what delay to
use when using the @Retry decorator
- Add a `fake_run_with_retry()` function to facilitate unit testing of
code using the retry decorator
If test author try to use attempts or delay with a value less than 1,
ValueError is raised.
@pieqq pieqq merged commit 99f7a0c into main Sep 11, 2024
47 checks passed
@pieqq pieqq deleted the 1543-jitter-backoff-decorator branch September 11, 2024 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants