-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TooManyProviderTokenUpdates issue on v0.14.1 #811
Comments
@zsophy111 If you turn on debug-level logging, pushy/pushy/src/main/java/com/eatthepath/pushy/apns/TokenAuthenticationApnsClientHandler.java Line 108 in 70fe90c
If you haven't already, could you please turn on debug logging for (at least)
I can see why you'd think so reading the code, but there's a sneaky thing going on where all of this stuff is happening on a Netty "event loop," and everything on that event loop should happen in order. In other words, one instance of a |
There are some efforts needed in our logging system to turn on the debug log. In our application, we have only 1 APNS client initiated in each instance, however, on this instance, I see multiple nioEventLoopGroup threads sending APNS push notifications. For example, on the same instance, when thread There is another evidence of this issue. We set the configurable APNS auth token expiration time as 1 week. On the same instance, we saw two different threads like Setting the configurable APNS auth token expiration time as 1 week or longer to let it expire passively can be a temporary work around of this issue, at least for our application. The Pushy still automatically regenerates the auth token when the token "passively" expires, but because of the 1 seconds latency, we still lose 1-2 push notification during the latency, every 60 mins for every instance. |
Oops. Sorry about the accidental issue closure; that was just a stray click. Thanks for the additional detail! The threading discovery is very surprising (and alarming). I'll see what I can figure out. |
BTW, we also have seen frequent
|
@zsophy111 I think we may have had a miscommunication. You wrote:
Does that client have multiple concurrent connections? |
Hi Jon, thanks for reply. I am not sure if they are "concurrent" or not. From the logs I have seen, on one instance in our one datacenter, there are multiple threads doing push notifications. One thread takes care of 99% of the traffic, and 3-4 other threads take care of the rest. I guess they share the same APNS client, but do push on different messages. Hope this clarify... The way we use ApnsClient is just the default configurations. I dont know too much about Netty, but I can check our Netty settings if you think anything related to Netty configurations. |
Okay. I think that answers my question. By default, clients use a single connection for sending notifications.
If we truly have multiple threads working on the same connection, that would be a serious bug. It's also behavior I can't reproduce in tests. When a channel is created, it's bound to a Netty "event loop," and that event loop is bound to a single thread. All operations for that thread should happen in that event loop. Is it possible that one connection is getting closed, then another is getting reopened on a different event loop? In other words, are you seeing log messages from different threads interleaved, or do you see a block of messages from one thread followed by a block of messages from a different thread? I'll add some more logging to make it more obvious when connections are opened or closed. |
Hi Jon, Unfortunately, I think what I saw in the logging is the former case: log messages from different threads interleaved. At millisecond level, nioEventLoopGroup-26-1 never stop doing push, while if I filter this nioEventLoopGroup-26-1 thread out, I saw other threads occasionally doing push as well. |
Hi Jon, |
I'm not convinced that there's a threading problem here for two reasons:
In any case, it's extremely difficult to debug this issue without more information. Without a reproduction case and without debug logs giving some sense of how often we're generating new tokens, all I can do is guess. I don't even have a way to check if a proposed solution actually fixes the problem. I understand it's not trivial for you to do, but for me to make any progress on this issue, I really need to know when you're generating new tokens. Thanks for your understanding. |
@supral Thank you, and please pardon the delay. This is helpful. Can you please clarify:
"I'm just using the defaults" is a perfectly good answer to either question. Thanks! |
Also: I just opened (and will soon merge) #820, which I think may have been keeping old connections in play longer than intended. That may resolve some or all of the noise in the logs, but I don't think it'll address the "too many updates" issue. |
Folks, Pushy 0.14.2 has just been released and should be making its way to Maven Central within the next hour or so. Could you please update to the latest version? It includes a few logging changes that could be helpful in getting to the root of this problem. In particular:
Thank you! |
It is how I initialize the APNS client. Hope this clarifies your question. Thanks
|
@supral In your case, you have 20 connections supported by 20 different threads. I don't think it's surprising or alarming to see multiple threads resetting tokens at about the same time (since all 20 connections were likely first created at about the same time). |
Hi, last night with 0.14.1 for one app we too got several hundred TooManyProviderTokenUpdates errors per server in one spike. Today updated all servers to 0.14.2 and turned on logging TokenAuthenticationApnsClientHandler. If problem repeats will inform with more information. |
Thanks! The logs from 0.14.2 will identify which connections are updating their tokens and when; that information will be crucial for getting to the bottom of this problem. |
Hello, update: today morning local time we have lot of pushes without response again:/ At around 09:55-10:00 GMT+3 we have seen about 100 TooManyProviderTokenUpdates on two servers. Checked logs. From them we called "generating new token for stream..." on the channels with the same id once in 50 minutes. Ok there where some 3001 or 2999 seconds, but that does not matter. So it seems an upstream issue. |
Hi, today again we got TooManyProviderTokenUpdates on one server. From logs again see, that generating new token on the same connection happens no more often than 50 minutes. After that for about 1.5 hour we got lot of pushes without responses (at least in time how long we wait for it). Then we got http2channelclosedexception, one connection closed and everything started to work normal again. pushy 0.14.2. |
Ugh… well, that's both confusing and frustrating. Thank you for the update.
Two clarifications here:
|
Hi, will answer tomorrow, because now away from work pc. |
Nov 24 16:06:50 DEBUG ... Generating new token for stream 448259 on channel [id: 0x6ba7eb6b...] For other channels in that time in logs see only messages about generating new tokens. When there were timeouts, see calls of ApnsClientMetricsListener::handleWriteFailure. From code looks like it confirms, that response was not received. Also from charts see, that Http2FrameLogger::logPing was not called in that time, if it helps.
Thanks, |
See from logs, that on connection establishment server sends settings MAX_CONCURRENT_STREAMS=1 and shortly after that MAX_CONCURRENT_STREAMS=1000 (at least in our case). Maybe then if pushy receives MAX_CONCURRENT_STREAMS=1 and lets say do not receive MAX_CONCURRENT_STREAMS>1 within a minute after that, then close such connections? Thanks, |
Hello, today on one server from around 14:05 to 14:55 (granularity of this info is about 5 minutes) we again got a lot of TooManyProviderTokenUpdates. In logs have found only one suspicious connection in that time: Nov 27 14:09:54 ... [id: 0xd518d1b3...] INBOUND SETTINGS: ack=false settings={HEADER_TABLE_SIZE=4096, MAX_CONCURRENT_STREAMS=1, INITIAL_WINDOW_SIZE=65535, MAX_FRAME_SIZE=16384, MAX_HEADER_LIST_SIZE=8000} In this case channel recovered itself. Also in that time there were no timeouts for push responses. From logs again see, that generating new token on the same connection happened no more often than 50 minutes. Thanks, |
Hi, I think it might be better if the multiple threads of an |
I emphasize that there is no indication in any of the official documentation that suggests this should be necessary, but I agree that it can't hurt. I took a shot at this in #850. |
Oops—I shouldn't have closed this issue. My mistake. I'll put out a new release with shared tokens shortly, then ideally close this ticket after an observation period. |
Had discussion with Apple support. One of proposed solution was to close and reopen another connection if TooManyProviderTokenUpdates was got. After new year we got few dozens of them in 27.1 and 2.3. |
Well, that's… frustrating. We can certainly add logic for that case, too. To be clear, I'm interpreting that suggestion to mean that a connection getting |
It is not clear about root cause of this problem. Sent you an email with more details. |
Thank you for the email; I think I'm convinced that we should try closing/reopening connections when we see |
This should be fixed in 0.15; if the problem persists after updating, please let us know and we'll re-open this issue. |
@jchambers We've run into this issue again with version 0.15. About 4000 notifications failed with |
Honestly, my suspicion at this point is that it's an upstream issue. Did the problem self-resolve for you? |
I believe so, but it's hard to tell because the service is deployed and thus restarted frequently for various reasons anyway. For now we will add notifications that failed with this error to the retry queue that we already have and hopefully that should work around the problem. Perhaps it would make sense to document this phenomenon because the docs for |
That seems reasonable. Depending on what's up, it may also make sense to just automatically retry them. Let me see if I can learn a little more about what's going on here, then figure out what to do about it. |
This issue is related to this one: #793
We are using Pushy for our APNS push notification in a complex enterprise application.
We originally used Pushy 0.13.10, but we continuously had native memory growth issue, so we upgraded the version to 0.14.1.
v0.14.1 solved the memory issue, however, we have met this TooManyProviderTokenUpdates issue, which we haven’t seen on v0.13.10.
We think this issue comes from this Pushy fix 747: d42f6d7
In this fix, it proactively invalidates APNS auth token (set it to null) every 50 mins by default, which makes the APNS client to retrieve new tokens after invalidation. If one APNS client retrieve new auth token more than once in 20mins from APNS server, it will return this TooManyProviderTokenUpdates error.
In our application, we don’t create an APNS client per thread but our APNS client is shared among threads.
In ApnsClient doc: https://pushy-apns.org/apidocs/0.14/com/eatthepath/pushy/apns/ApnsClient.html
it says “APNs clients are intended to be long-lived, persistent resources. They are also inherently thread-safe and can be shared across many threads in a complex application. “ so I think we’re using it correctly.
So in this Pushy fix 747, perhaps when the timer fires and the token is set to null, more than one thread try and fetch a new token which pushes us over the limit. A fix could potentially be for the timer thread to fetch a new token and overwrite the old one rather than just setting it to null.
We can not rollback to v0.13.10 because of the memory leak issue, or older version because we need the iOS13 features. So we really rely on the new release to fix this TooManyProviderTokenUpdates error. Please let me know if you need more information for the fix. Looking forward to hearing back!
The text was updated successfully, but these errors were encountered: