Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding num_retries in discovery.build()? #1049

Closed
yohplala opened this issue Sep 26, 2020 · 5 comments · Fixed by #1053
Closed

Understanding num_retries in discovery.build()? #1049

yohplala opened this issue Sep 26, 2020 · 5 comments · Fixed by #1053
Assignees
Labels
type: question Request for information or clarification. Not an issue.

Comments

@yohplala
Copy link

yohplala commented Sep 26, 2020

Hi @DFrenkel @busunkim96,

CONTEXT
Recently, I got an HttpError 502 when running googleapiclient.discovery.build().

  File "/usr/local/lib/python3.8/dist-packages/my_app-0.3.1-py3.8.egg/my_app/data/gd.py", line 58, in __init__
    self.drive = googleapiclient.discovery.build('drive', 'v3', http=auth_http)
  File "/usr/local/lib/python3.8/dist-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/googleapiclient/discovery.py", line 292, in build
    raise e
  File "/usr/local/lib/python3.8/dist-packages/googleapiclient/discovery.py", line 268, in build
    content = _retrieve_discovery_doc(
  File "/usr/local/lib/python3.8/dist-packages/googleapiclient/discovery.py", line 365, in _retrieve_discovery_doc
    resp, content = req.execute(num_retries=num_retries)
  File "/usr/local/lib/python3.8/dist-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/googleapiclient/http.py", line 907, in execute
    raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 502 when requesting https://www.googleapis.com/discovery/v1/apis/drive/v3/rest returned "Bad Gateway">

I could read about this error from this SO thread.
Quest.
Is anyone else getting a 502 Bad Gateway error when trying to access https://www.googleapis.com/discovery/v1/apis/drive/v2/rest
Ans.
It appears to have started working again after being down for ~ 1 hr. Nothing needed to be done.

Additionally, Google doc suggests to address 5xx HttpError with a retry strategy.

Heading in this direction, I finally noticed that a num_retries made its apparition in build a few weeks ago thanks to you @DFrenkel 👍 (PR)

So my trouble seems like easy to solve with this parameter. Nonetheless I have 2 questions.

QUESTION 1: Setting an appropriate num_retries value

  • What is the max time discovery can be down? Is this 1hr true? This seems to me like a large time, but then why not. I am not an expert on this topic.
  • What would be the appropriate num_retries from this max time?
    I read the formula used to assess the sleep time between 2 retries:
    sleep_time = rand() * 2 ** retry_num

    random() giving a number between 0 and 1, I considered it being equal to 0.5. I could then see that with 12 retries, I could have a process retrying for about 1hr 10 minutes. Do you consider this correct?

QUESTION 2: num_retries inheritance?
Does the subsequent num_retries in execute() inherit as default value the value set for num_retries in build()?

Thanks a lot for your help and support.
Have a good day,
Bests,

PS: num_retries does not appear documented yet in source doc, which is where I 1st looked to see how I could implement it.
I most notably found this other thread, suggesting to subclass HttpRequest, with default num_retries. Seemed like another level of work to conduct, and very happy you could implement it directly in build()! :) Thanks!

@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Sep 27, 2020
@busunkim96 busunkim96 added type: question Request for information or clarification. Not an issue. and removed triage me I really want to be triaged. labels Sep 28, 2020
@DFrenkel
Copy link

DFrenkel commented Oct 2, 2020

@yohplala, thank you for your questions.

In regards to guidance related to appropriate number of retries: Guidance of setting a number of retries can vary, but we suggest that using large number of retries should not be considered as a mechanism to avoid longer endpoint outages. Retries with exponential backoff (with jitter) is well-known mechanism in the industry, and it is positioned as a correct response of a well-behaved client to occasional, short-lived throttling (503) or gateway timeouts (504). In fact, we should probably reconsider making error 502 retriable as it signifies an invalid response from the upstream server, therefore making 502 errors not temporal or transient in nature.
Setting a number of retries too high (above 2 or 3) can cause your own process to not use computing resources efficiently due to exponentially increasing waiting/sleeping times, and therefore may cause a cascading failure. It may be better to fail faster, after one or two retries, and have an alternative mechanism to handle an apparent outage, for example, by retrying the entire workflow or by returning an error to your customer that your customer’s client can handle with its own retries.

Answering your second question: yes, the number of retries is passed to downstream methods. If you take a look at this line, num_retries value is passed to the actual worker method responsible for fetching discovery document from the http endpoint, and then also passed to the http wrapped via this line.

Finally, thank you for reporting a miss in documentation update. We will refresh the docs to properly document usage of num_retries.

@yohplala
Copy link
Author

yohplala commented Oct 5, 2020

Thanks @DFrenkel for your very complete answer!
All clear for me!
Thanks again!

@eamon-woods
Copy link

the num_retries parameter for discovery.build in the reference doc is documented as num_retries: Integer, number of times to retry discovery with randomized exponential backoff in case of intermittent/connection issues. This line only mentions retry discovery, so it does not look like it implies that the number of retries is passed to downstream methods. is it still true that the number of retries is passed to downstream methods? can that behavior be relied on even though it's not stated in the reference doc?

@DFrenkel
Copy link

Yes, the number of retries is passed to downstream methods during discovery. I cover this aspect in my comment above:

yes, the number of retries is passed to downstream methods. If you take a look at this line, num_retries value is passed to the actual worker method responsible for fetching discovery document from the http endpoint, and then also passed to the http wrapped via this line.

@jakunow
Copy link

jakunow commented Oct 19, 2021

@DFrenkel there is a dependency for num_retries on static_discovery

Link to code that you've provided in previous comment link to commit is pointing to old commit.
Currently in this code for version 2.27.0 function req.execute will be run only if static_discovery=False.

If static_discovery is not provided in discovery.build() it will default to False and num_retries won't be used in downstream methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants