-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Token refresh doesn't work in traffic storm #1193
Comments
@inlined how is this function staying hot for 15 minutes (and possibly longer) to experience a token expiration? |
It's a periodic function that runs every 15m. Thanks to the (newer) architecture in the Cloud Functions backend, the code that that implements |
Thanks @inlined. @trentbrooks, can I ask you to share the token expiry time logs you've collected from your runs? I see you have shared a couple of log entries at firebase/firebase-functions#579 (comment). Do you have any more? Update: Also how long does each invocation of your function last approximately (i.e. the time to send all notifications)? |
Hi @hiranya911 Here are a couple of screenshots from today with the logs. It's been working consistently since @inlined suggested the fix a few days ago. Function runs scheduled every 15 minutes and sends 250-300 topic messages (promises), and function finishes each run in 1500-5500 milliseconds. https://pasteboard.co/JS4AoFY.png Code: |
Thanks for verifying the execution time of your triggers. That takes some of the uncertainty out of the picture. The code looks like you're logging the token expiry time after a forced refresh. @trentbrooks can you please confirm that's indeed what we're seeing in the logs? Assuming that's the case, this is the summary of the log:
@inlined does this look reasonable to you? Why would the auth/metadata server send a 15 min token at 3:00, and then send a 30 min token at 3:15? We see that again at 4:00 and 4:15. Is it juggling multiple tokens for the same client? Ideally I would have expected to see some tokens with full 60 minutes expiry time, and even a few with 45 minutes expiry time. @trentbrooks do you see anything like that in the remainder of your logs? |
I'll try to comment while using as few codewords as possible... In App Engine, Cloud Functions, and Cloud Run, there's a single server (we'll call it the "load balancer") per machine that is responsible for creating and deleting the sandboxes that runs users' code. It provides a fully virtualized environment, including the network stack, and may directly implement things like a filesystem or HTTP calls to Because the cache is at the load balancer, it means that the cache can be refreshed from any instance of the function and the TTL is more of a global TTL. As to why we're getting 30m tokens instead of 60m tokens... I can't answer that. But it seems that we get 30m tokens and would previously have executed with only a few seconds remaining before Trent was doing forced refreshes. My suspicion is that the token had <5s remaining, it wasn't expired so we returned it, and the requests queued in the socket pool until the token was no longer valid. |
I assume there's only one instance of this particular function (the logs certainly seems to suggest so). That makes it even more strange as to why we are not getting 60min tokens. Also this implies the SDK's built-in proactive token refresher (which uses Node.js timers) no longer works in Functions. |
Here are a couple more pages of fresh logs from today. All working as expected, so no complaints from me. Happy to change the logs if there is something else if it's more useful? https://pasteboard.co/JSbmQHW.png Yes there is 1 instance of this particular function '15m_binance' which does the force token refresh. I have similar functions (without force token refresh) for the 1h, 4h, 1d (daily) and 1w (weekly). Is the refreshed admin token shared across all functions? Or is each function completely independent and requests it's own token? Note- I also have 5 similar scheduled functions for a different exchange ie. '15m_fx', '1h_fx', etc. I have not touched any of these yet, and the 15 minute is still producing the 401 auth error regularly. Happy to provide any assistance. |
It used to be independent. But with the new architecture it's anybody's guess. May be that's why we don't see 45min and 60min tokens in the logs? May be the token is being refreshed at the load balancer by other function instances? |
@inlined I've proposed a set of changes that can solve this type of issues in a new PR. PTAL. |
This is a reassignment to close out firebase/firebase-functions#579. The customer's function makes many many Admin SDK API calls every 15 minutes. Previously it would crash with 401 errors because the API server was no longer recognizing the token as valid. The customer was able to work around this error by calling
admin.app().INTERNAL.getToken(/*force_refresh=*/true)
before their code ran, so it's definitely an Admin bug.Given the frequency of the crashes, I expect that this bug is happening on the border of when a token is about to expire and the Admin SDK isn't properly refreshing it. I have two theories for what could cause this:
My recommendation would probably be something surgical like changing this line to subtract a few minutes from the actual expiration time so that we don't serialize requests with a token that's about to expire.
The text was updated successfully, but these errors were encountered: