Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TooManyProviderTokenUpdates issue on v0.14.1 #811

Closed
zsophy111 opened this issue Sep 21, 2020 · 37 comments · Fixed by #850
Closed

TooManyProviderTokenUpdates issue on v0.14.1 #811

zsophy111 opened this issue Sep 21, 2020 · 37 comments · Fixed by #850
Labels
Milestone

Comments

@zsophy111
Copy link

This issue is related to this one: #793

We are using Pushy for our APNS push notification in a complex enterprise application.
We originally used Pushy 0.13.10, but we continuously had native memory growth issue, so we upgraded the version to 0.14.1.
v0.14.1 solved the memory issue, however, we have met this TooManyProviderTokenUpdates issue, which we haven’t seen on v0.13.10.

We think this issue comes from this Pushy fix 747: d42f6d7
In this fix, it proactively invalidates APNS auth token (set it to null) every 50 mins by default, which makes the APNS client to retrieve new tokens after invalidation. If one APNS client retrieve new auth token more than once in 20mins from APNS server, it will return this TooManyProviderTokenUpdates error.

In our application, we don’t create an APNS client per thread but our APNS client is shared among threads.
In ApnsClient doc: https://pushy-apns.org/apidocs/0.14/com/eatthepath/pushy/apns/ApnsClient.html
it says “APNs clients are intended to be long-lived, persistent resources. They are also inherently thread-safe and can be shared across many threads in a complex application. “ so I think we’re using it correctly.

So in this Pushy fix 747, perhaps when the timer fires and the token is set to null, more than one thread try and fetch a new token which pushes us over the limit. A fix could potentially be for the timer thread to fetch a new token and overwrite the old one rather than just setting it to null.

We can not rollback to v0.13.10 because of the memory leak issue, or older version because we need the iOS13 features. So we really rely on the new release to fix this TooManyProviderTokenUpdates error. Please let me know if you need more information for the fix. Looking forward to hearing back!

@jchambers
Copy link
Owner

@zsophy111 If you turn on debug-level logging, TokenAuthenticationApnsClientHandler should log whenever it generates a new token:

log.debug("Generating new token for stream {}.", streamId);

If you haven't already, could you please turn on debug logging for (at least) TokenAuthenticationApnsClientHandler so we can learn more about how often the token is actually getting changed?

…perhaps when the timer fires and the token is set to null, more than one thread try and fetch a new token which pushes us over the limit…

I can see why you'd think so reading the code, but there's a sneaky thing going on where all of this stuff is happening on a Netty "event loop," and everything on that event loop should happen in order. In other words, one instance of a TokenAuthenticationApnsClientHandler should only ever get used by one thread. It may be that something is going wrong there, but the additional information from the logs will help us understand if that's the case.

@zsophy111
Copy link
Author

zsophy111 commented Sep 22, 2020

There are some efforts needed in our logging system to turn on the debug log.
But according to my below observation, one instance of a TokenAuthenticationApnsClientHandler should only ever get used by one thread is not the case we have seen in our application.

In our application, we have only 1 APNS client initiated in each instance, however, on this instance, I see multiple nioEventLoopGroup threads sending APNS push notifications. For example, on the same instance, when thread nioEventLoopGroup-26-1 (looks like the main thread which takes majority of our traffic) fails to send push because of TooManyProviderTokenUpdates, another thread nioEventLoopGroup-12-1 can send push successfully at the same time frame which means nioEventLoopGroup-12-1 got the updated auth token faster than nioEventLoopGroup-26-1.

There is another evidence of this issue. We set the configurable APNS auth token expiration time as 1 week. On the same instance, we saw two different threads like nioEventLoopGroup-26-1 and nioEventLoopGroup-12-1 both emitted the error log msg APNs server reports token has expired, which mean both of them should re-fetch the auth token if we set the expiration time less than 60mins.

Setting the configurable APNS auth token expiration time as 1 week or longer to let it expire passively can be a temporary work around of this issue, at least for our application. The Pushy still automatically regenerates the auth token when the token "passively" expires, but because of the 1 seconds latency, we still lose 1-2 push notification during the latency, every 60 mins for every instance.

@jchambers jchambers reopened this Sep 22, 2020
@jchambers
Copy link
Owner

jchambers commented Sep 22, 2020

Oops. Sorry about the accidental issue closure; that was just a stray click.

Thanks for the additional detail! The threading discovery is very surprising (and alarming). I'll see what I can figure out.

@zsophy111
Copy link
Author

zsophy111 commented Sep 22, 2020

BTW, we also have seen frequentStream closed before a reply was received error log after bumped to v0.14.1.
Looks like it was a closed issue here: #561
Not sure if this is related to this issue as well, but just our observation.

java.io.IOException: Stream closed before a reply was received
	at com.eatthepath.pushy.apns.ApnsClientHandler.<clinit>(Unknown Source)
	at com.eatthepath.pushy.apns.TokenAuthenticationApnsClientHandler$TokenAuthenticationApnsClientHandlerBuilder.build(TokenAuthenticationApnsClientHandler.java:88)
	at com.eatthepath.pushy.apns.TokenAuthenticationApnsClientHandler$TokenAuthenticationApnsClientHandlerBuilder.build(TokenAuthenticationApnsClientHandler.java:60)
	at io.netty.handler.codec.http2.AbstractHttp2ConnectionHandlerBuilder.buildFromCodec(AbstractHttp2ConnectionHandlerBuilder.java:551)
	at io.netty.handler.codec.http2.AbstractHttp2ConnectionHandlerBuilder.buildFromConnection(AbstractHttp2ConnectionHandlerBuilder.java:540)
	at io.netty.handler.codec.http2.AbstractHttp2ConnectionHandlerBuilder.build(AbstractHttp2ConnectionHandlerBuilder.java:504)
	at com.eatthepath.pushy.apns.ApnsClientHandler$ApnsClientHandlerBuilder.build(ApnsClientHandler.java:145)
	at com.eatthepath.pushy.apns.ApnsChannelFactory$1.initChannel(ApnsChannelFactory.java:128)
	at com.eatthepath.pushy.apns.ApnsChannelFactory$1.initChannel(ApnsChannelFactory.java:100)
	at io.netty.channel.ChannelInitializer.initChannel(ChannelInitializer.java:129)
	at io.netty.channel.ChannelInitializer.handlerAdded(ChannelInitializer.java:112)
	at io.netty.channel.AbstractChannelHandlerContext.callHandlerAdded(AbstractChannelHandlerContext.java:971)
	at io.netty.channel.DefaultChannelPipeline.callHandlerAdded0(DefaultChannelPipeline.java:609)
	at io.netty.channel.DefaultChannelPipeline.access$100(DefaultChannelPipeline.java:46)
	at io.netty.channel.DefaultChannelPipeline$PendingHandlerAddedTask.execute(DefaultChannelPipeline.java:1463)
	at io.netty.channel.DefaultChannelPipeline.callHandlerAddedForAllHandlers(DefaultChannelPipeline.java:1115)
	at io.netty.channel.DefaultChannelPipeline.invokeHandlerAddedIfNeeded(DefaultChannelPipeline.java:650)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:502)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.register(AbstractChannel.java:468)
	at io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:87)
	at io.netty.channel.SingleThreadEventLoop.register(SingleThreadEventLoop.java:81)
	at io.netty.channel.MultithreadEventLoopGroup.register(MultithreadEventLoopGroup.java:86)
	at io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:323)
	at io.netty.bootstrap.Bootstrap.doResolveAndConnect(Bootstrap.java:155)
	at io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:116)
	at com.eatthepath.pushy.apns.ApnsChannelFactory.lambda$create$2(ApnsChannelFactory.java:176)
	at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
	at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:170)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:834)

@jchambers
Copy link
Owner

@zsophy111 I think we may have had a miscommunication. You wrote:

…we have only 1 APNS client initiated in each instance, however, on this instance, I see multiple nioEventLoopGroup threads sending APNS push notifications.

Does that client have multiple concurrent connections?

@zsophy111
Copy link
Author

zsophy111 commented Sep 24, 2020

Hi Jon, thanks for reply.

I am not sure if they are "concurrent" or not. From the logs I have seen, on one instance in our one datacenter, there are multiple threads doing push notifications. One thread takes care of 99% of the traffic, and 3-4 other threads take care of the rest. I guess they share the same APNS client, but do push on different messages. Hope this clarify...

The way we use ApnsClient is just the default configurations. I dont know too much about Netty, but I can check our Netty settings if you think anything related to Netty configurations.

@jchambers
Copy link
Owner

The way we use ApnsClient is just the default configurations.

Okay. I think that answers my question. By default, clients use a single connection for sending notifications.

From the logs I have seen, on one instance in our one datacenter, there are multiple threads doing push notifications.

If we truly have multiple threads working on the same connection, that would be a serious bug. It's also behavior I can't reproduce in tests.

When a channel is created, it's bound to a Netty "event loop," and that event loop is bound to a single thread. All operations for that thread should happen in that event loop. Is it possible that one connection is getting closed, then another is getting reopened on a different event loop? In other words, are you seeing log messages from different threads interleaved, or do you see a block of messages from one thread followed by a block of messages from a different thread?

I'll add some more logging to make it more obvious when connections are opened or closed.

@zsophy111
Copy link
Author

Hi Jon,

Unfortunately, I think what I saw in the logging is the former case: log messages from different threads interleaved.

At millisecond level, nioEventLoopGroup-26-1 never stop doing push, while if I filter this nioEventLoopGroup-26-1 thread out, I saw other threads occasionally doing push as well.

@zsophy111
Copy link
Author

Hi Jon,
Any findings?

@jchambers
Copy link
Owner

I'm not convinced that there's a threading problem here for two reasons:

  1. I've been entirely unable to reproduce the problem.
  2. If you're sending lots of notifications, it's entirely possible that you'll have enough of a backlog that you'll be seeing messages on a closed connection resolving even after a second connection has been opened. As an example, you could have 1,000 notifications sent and waiting for a reply from the APNs server. The first would get rejected with a token issue, and Pushy would close the connection and open a fresh one. While the fresh one is opening and starting to send notifications, the original connection is still processing its backlog of notifications (assuming the reply from the server has already been received, but not processed).

In any case, it's extremely difficult to debug this issue without more information. Without a reproduction case and without debug logs giving some sense of how often we're generating new tokens, all I can do is guess. I don't even have a way to check if a proposed solution actually fixes the problem.

I understand it's not trivial for you to do, but for me to make any progress on this issue, I really need to know when you're generating new tokens. Thanks for your understanding.

@samlai810
Copy link

Hi, I'm facing the exactly same issue on 0.14.1. We're still didn't have any idea on which part we're doing wrong to cause this issue, so I will provide the information as detailed as possible. Hope it is not annoying.

Here is the debug log of the TokenAuthenticationApnsClientHandler.
The log shows more than one eventLoopGroup is trying to generating a new token at the same time. As I'm not familiar with Netty, may I know is it normal behavior?

[DEBUG] 2020-10-15 12:18:05.695 [nioEventLoopGroup-2-5] TokenAuthenticationApnsClientHandler - Generating new token for stream 40481.
[DEBUG] 2020-10-15 12:18:25.078 [nioEventLoopGroup-2-11] TokenAuthenticationApnsClientHandler - Generating new token for stream 40485.
[DEBUG] 2020-10-15 12:19:59.315 [nioEventLoopGroup-2-16] TokenAuthenticationApnsClientHandler - Generating new token for stream 39875.
[DEBUG] 2020-10-15 13:08:05.765 [nioEventLoopGroup-2-5] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 13:08:25.086 [nioEventLoopGroup-2-11] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 13:09:59.356 [nioEventLoopGroup-2-16] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 13:58:29.332 [nioEventLoopGroup-2-17] TokenAuthenticationApnsClientHandler - Generating new token for stream 39857.
[DEBUG] 2020-10-15 14:00:17.471 [nioEventLoopGroup-2-1] TokenAuthenticationApnsClientHandler - Generating new token for stream 39769.
[DEBUG] 2020-10-15 14:34:41.225 [nioEventLoopGroup-2-9] TokenAuthenticationApnsClientHandler - Generating new token for stream 39889.
[DEBUG] 2020-10-15 14:34:41.225 [nioEventLoopGroup-2-3] TokenAuthenticationApnsClientHandler - Generating new token for stream 40469.
[DEBUG] 2020-10-15 14:36:34.601 [nioEventLoopGroup-2-3] TokenAuthenticationApnsClientHandler - Generating new token for stream 39805.
[DEBUG] 2020-10-15 14:36:34.606 [nioEventLoopGroup-2-8] TokenAuthenticationApnsClientHandler - Generating new token for stream 39837.
[DEBUG] 2020-10-15 14:36:35.607 [nioEventLoopGroup-2-19] TokenAuthenticationApnsClientHandler - Generating new token for stream 39925.
[DEBUG] 2020-10-15 14:36:35.640 [nioEventLoopGroup-2-6] TokenAuthenticationApnsClientHandler - Generating new token for stream 39825.
[DEBUG] 2020-10-15 14:36:37.107 [nioEventLoopGroup-2-5] TokenAuthenticationApnsClientHandler - Generating new token for stream 39773.
[DEBUG] 2020-10-15 14:36:37.114 [nioEventLoopGroup-2-2] TokenAuthenticationApnsClientHandler - Generating new token for stream 39881.
[DEBUG] 2020-10-15 14:36:37.964 [nioEventLoopGroup-2-9] TokenAuthenticationApnsClientHandler - Generating new token for stream 40487.
[DEBUG] 2020-10-15 14:36:37.966 [nioEventLoopGroup-2-4] TokenAuthenticationApnsClientHandler - Generating new token for stream 39939.
[DEBUG] 2020-10-15 14:48:29.364 [nioEventLoopGroup-2-17] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 14:50:17.497 [nioEventLoopGroup-2-1] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 15:02:11.381 [nioEventLoopGroup-2-13] TokenAuthenticationApnsClientHandler - Generating new token for stream 40093.
[DEBUG] 2020-10-15 15:02:11.402 [nioEventLoopGroup-2-7] TokenAuthenticationApnsClientHandler - Generating new token for stream 40459.
[DEBUG] 2020-10-15 15:02:18.225 [nioEventLoopGroup-2-13] TokenAuthenticationApnsClientHandler - Generating new token for stream 39709.
[DEBUG] 2020-10-15 15:02:18.232 [nioEventLoopGroup-2-1] TokenAuthenticationApnsClientHandler - Generating new token for stream 39741.
[DEBUG] 2020-10-15 15:02:19.805 [nioEventLoopGroup-2-5] TokenAuthenticationApnsClientHandler - Generating new token for stream 40483.
[DEBUG] 2020-10-15 15:02:19.805 [nioEventLoopGroup-2-11] TokenAuthenticationApnsClientHandler - Generating new token for stream 39819.
[DEBUG] 2020-10-15 15:02:21.332 [nioEventLoopGroup-2-11] TokenAuthenticationApnsClientHandler - Generating new token for stream 40487.
[DEBUG] 2020-10-15 15:02:21.349 [nioEventLoopGroup-2-16] TokenAuthenticationApnsClientHandler - Generating new token for stream 39877.
[DEBUG] 2020-10-15 15:02:21.896 [nioEventLoopGroup-2-17] TokenAuthenticationApnsClientHandler - Generating new token for stream 39859.
[DEBUG] 2020-10-15 15:02:21.900 [nioEventLoopGroup-2-1] TokenAuthenticationApnsClientHandler - Generating new token for stream 39771.
[DEBUG] 2020-10-15 15:24:41.255 [nioEventLoopGroup-2-9] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 15:24:41.280 [nioEventLoopGroup-2-3] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 15:26:34.621 [nioEventLoopGroup-2-3] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 15:26:34.651 [nioEventLoopGroup-2-8] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 15:26:35.661 [nioEventLoopGroup-2-19] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 15:26:35.693 [nioEventLoopGroup-2-6] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 15:26:37.124 [nioEventLoopGroup-2-5] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 15:26:37.166 [nioEventLoopGroup-2-2] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 15:26:38.009 [nioEventLoopGroup-2-4] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 15:26:38.013 [nioEventLoopGroup-2-9] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 15:52:11.430 [nioEventLoopGroup-2-13] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 15:52:11.456 [nioEventLoopGroup-2-7] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 15:52:18.239 [nioEventLoopGroup-2-13] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 15:52:18.267 [nioEventLoopGroup-2-1] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 15:52:19.833 [nioEventLoopGroup-2-5] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 15:52:19.850 [nioEventLoopGroup-2-11] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 15:52:21.337 [nioEventLoopGroup-2-11] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 15:52:21.362 [nioEventLoopGroup-2-16] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 15:52:21.902 [nioEventLoopGroup-2-17] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 15:52:21.907 [nioEventLoopGroup-2-1] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 17:06:15.587 [nioEventLoopGroup-2-3] TokenAuthenticationApnsClientHandler - Generating new token for stream 40471.
[DEBUG] 2020-10-15 17:09:44.528 [nioEventLoopGroup-2-9] TokenAuthenticationApnsClientHandler - Generating new token for stream 39891.
[DEBUG] 2020-10-15 17:28:03.524 [nioEventLoopGroup-2-3] TokenAuthenticationApnsClientHandler - Generating new token for stream 39807.
[DEBUG] 2020-10-15 17:30:32.183 [nioEventLoopGroup-2-8] TokenAuthenticationApnsClientHandler - Generating new token for stream 39839.
[DEBUG] 2020-10-15 17:39:38.292 [nioEventLoopGroup-2-19] TokenAuthenticationApnsClientHandler - Generating new token for stream 39927.
[DEBUG] 2020-10-15 17:50:02.771 [nioEventLoopGroup-2-6] TokenAuthenticationApnsClientHandler - Generating new token for stream 39827.
[DEBUG] 2020-10-15 17:56:15.597 [nioEventLoopGroup-2-3] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 17:59:44.549 [nioEventLoopGroup-2-9] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 18:18:03.534 [nioEventLoopGroup-2-3] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 18:20:32.236 [nioEventLoopGroup-2-8] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 18:29:38.332 [nioEventLoopGroup-2-19] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.
[DEBUG] 2020-10-15 18:40:02.801 [nioEventLoopGroup-2-6] TokenAuthenticationApnsClientHandler - Proactively expiring authentication token.

And the following graph is my infrastructure, is that appropriate to let two server share one outbound IP?
Untitled Diagram (5)

Hope it is useful for the investigation. Thanks.

@jchambers
Copy link
Owner

@supral Thank you, and please pardon the delay. This is helpful. Can you please clarify:

  1. How many threads are in your event loop group?
  2. How many concurrent connections does your client have?

"I'm just using the defaults" is a perfectly good answer to either question. Thanks!

@jchambers
Copy link
Owner

Also: I just opened (and will soon merge) #820, which I think may have been keeping old connections in play longer than intended. That may resolve some or all of the noise in the logs, but I don't think it'll address the "too many updates" issue.

@jchambers
Copy link
Owner

Folks, Pushy 0.14.2 has just been released and should be making its way to Maven Central within the next hour or so. Could you please update to the latest version? It includes a few logging changes that could be helpful in getting to the root of this problem. In particular:

  • Turning on DEBUG logging for com.eatthepath.pushy.apns.ApnsChannelFactory will help us understand when channels are opened/closed
  • The existing logging around cycling authentication tokens now identifies which channels were affected by the token change

Thank you!

@samlai810
Copy link

@supral Thank you, and please pardon the delay. This is helpful. Can you please clarify:

  1. How many threads are in your event loop group?
  2. How many concurrent connections does your client have?

"I'm just using the defaults" is a perfectly good answer to either question. Thanks!

It is how I initialize the APNS client. Hope this clarifies your question. Thanks

EventLoopGroup eventLoopGroup = new NioEventLoopGroup(20);
ApnsClient client = new ApnsClientBuilder()
	.setApnsServer(ApnsClientBuilder.PRODUCTION_APNS_HOST)
	.setSigningKey(ApnsSigningKey.loadFromPkcs8File(new File(keyPath), teamId, keyId))
	.setConcurrentConnections(20)
	.setEventLoopGroup(eventLoopGroup)
	.setMetricsListener( new DropwizardApnsClientMetricsListener())
	.setConnectionTimeout(Duration.ofSeconds(60))
	.setFrameLogger(new Http2FrameLogger(LogLevel.DEBUG))
	.build();

@jchambers
Copy link
Owner

@supral In your case, you have 20 connections supported by 20 different threads. I don't think it's surprising or alarming to see multiple threads resetting tokens at about the same time (since all 20 connections were likely first created at about the same time).

@janzar
Copy link

janzar commented Nov 3, 2020

Hi, last night with 0.14.1 for one app we too got several hundred TooManyProviderTokenUpdates errors per server in one spike. Today updated all servers to 0.14.2 and turned on logging TokenAuthenticationApnsClientHandler. If problem repeats will inform with more information.

@jchambers
Copy link
Owner

Thanks! The logs from 0.14.2 will identify which connections are updating their tokens and when; that information will be crucial for getting to the bottom of this problem.

@janzar
Copy link

janzar commented Nov 10, 2020

Hello, update: today morning local time we have lot of pushes without response again:/ At around 09:55-10:00 GMT+3 we have seen about 100 TooManyProviderTokenUpdates on two servers. Checked logs. From them we called "generating new token for stream..." on the channels with the same id once in 50 minutes. Ok there where some 3001 or 2999 seconds, but that does not matter. So it seems an upstream issue.

@janzar
Copy link

janzar commented Nov 24, 2020

Hi, today again we got TooManyProviderTokenUpdates on one server. From logs again see, that generating new token on the same connection happens no more often than 50 minutes. After that for about 1.5 hour we got lot of pushes without responses (at least in time how long we wait for it). Then we got http2channelclosedexception, one connection closed and everything started to work normal again. pushy 0.14.2.

@jchambers
Copy link
Owner

Ugh… well, that's both confusing and frustrating. Thank you for the update.

After that for about 1.5 hour we got lot of pushes without responses (at least in time how long we wait for it).

Two clarifications here:

  1. Were those notifications all on the same connection? If the server decides that a connection is "bad," it can send an HTTP/2 SETTINGS frame that limits the number of concurrent streams to just 1 (down from a "normal" limit of 1,500 to 2,500).
  2. How long are you waiting for a response?

@janzar
Copy link

janzar commented Nov 24, 2020

Hi, will answer tomorrow, because now away from work pc.
Thanks,
Z.

@janzar
Copy link

janzar commented Nov 25, 2020

  1. Can't tell whether these pushes were on one or more connections, because we do not log all information. It would be great if PushNotificationResponse or ApnsClientMetricsListener methods contain some information about channel or channel itself, then it would be easier to log channels on errors. But as at the end was closed one connection, my guess is that they all are through one connection. From logs I see for one channel:

Nov 24 16:06:50 DEBUG ... Generating new token for stream 448259 on channel [id: 0x6ba7eb6b...]
Nov 24 16:06:56 INFO ... [id: 0x6ba7eb6b...] INBOUND SETTINGS: ack=false settings={HEADER_TABLE_SIZE=4096, MAX_CONCURRENT_STREAMS=1, INITIAL_WINDOW_SIZE=65535, MAX_FRAME_SIZE=16384, MAX_HEADER_LIST_SIZE=8000}
Nov 24 16:56:50 ... DEBUG Generating new token for stream 522045 on channel [id: 0x6ba7eb6b...]
Nov 24 17:26:57 WARN ... APNs server reports token for channel [id: 0x6ba7eb6b ...] has expired.
Nov 24 17:26:57 INFO ... [id: 0x6ba7eb6b...] OUTBOUND GO_AWAY: lastStreamId=0 errorCode=2 length=0 bytes=
Nov 24 17:26:57 DEBUG ... Discarded channel [id: 0x6ba7eb6b ...]

For other channels in that time in logs see only messages about generating new tokens.

When there were timeouts, see calls of ApnsClientMetricsListener::handleWriteFailure. From code looks like it confirms, that response was not received. Also from charts see, that Http2FrameLogger::logPing was not called in that time, if it helps.

  1. For response on these servers we are waiting 10 seconds.

Thanks,
Z.

@janzar
Copy link

janzar commented Nov 25, 2020

See from logs, that on connection establishment server sends settings MAX_CONCURRENT_STREAMS=1 and shortly after that MAX_CONCURRENT_STREAMS=1000 (at least in our case). Maybe then if pushy receives MAX_CONCURRENT_STREAMS=1 and lets say do not receive MAX_CONCURRENT_STREAMS>1 within a minute after that, then close such connections?

Thanks,
Z.

@janzar
Copy link

janzar commented Nov 27, 2020

Hello, today on one server from around 14:05 to 14:55 (granularity of this info is about 5 minutes) we again got a lot of TooManyProviderTokenUpdates. In logs have found only one suspicious connection in that time:

Nov 27 14:09:54 ... [id: 0xd518d1b3...] INBOUND SETTINGS: ack=false settings={HEADER_TABLE_SIZE=4096, MAX_CONCURRENT_STREAMS=1, INITIAL_WINDOW_SIZE=65535, MAX_FRAME_SIZE=16384, MAX_HEADER_LIST_SIZE=8000}
Nov 27 14:59:45 ... [id: 0xd518d1b3...] INBOUND SETTINGS: ack=false settings={HEADER_TABLE_SIZE=4096, MAX_CONCURRENT_STREAMS=1000, INITIAL_WINDOW_SIZE=65535, MAX_FRAME_SIZE=16384, MAX_HEADER_LIST_SIZE=8000}

In this case channel recovered itself. Also in that time there were no timeouts for push responses. From logs again see, that generating new token on the same connection happened no more often than 50 minutes.

Thanks,
Z.

@extraSix
Copy link

@supral In your case, you have 20 connections supported by 20 different threads. I don't think it's surprising or alarming to see multiple threads resetting tokens at about the same time (since all 20 connections were likely first created at about the same time).

Hi, I think it might be better if the multiple threads of an ApnsClient share the same token.

@jchambers
Copy link
Owner

Hi, I think it might be better if the multiple threads of an ApnsClient share the same token.

I emphasize that there is no indication in any of the official documentation that suggests this should be necessary, but I agree that it can't hurt. I took a shot at this in #850.

@jchambers
Copy link
Owner

Oops—I shouldn't have closed this issue. My mistake. I'll put out a new release with shared tokens shortly, then ideally close this ticket after an observation period.

@jchambers jchambers reopened this Mar 10, 2021
@janzar
Copy link

janzar commented Mar 11, 2021

Had discussion with Apple support. One of proposed solution was to close and reopen another connection if TooManyProviderTokenUpdates was got. After new year we got few dozens of them in 27.1 and 2.3.

@jchambers
Copy link
Owner

One of proposed solution was to close and reopen another connection if TooManyProviderTokenUpdates was got.

Well, that's… frustrating. We can certainly add logic for that case, too.

To be clear, I'm interpreting that suggestion to mean that a connection getting TooManyProviderTokenUpdates errors may just be in some kind of bad state, and restarting the connection can fix it. Am I understanding correctly?

@janzar
Copy link

janzar commented Mar 12, 2021

It is not clear about root cause of this problem. Sent you an email with more details.

@jchambers
Copy link
Owner

Thank you for the email; I think I'm convinced that we should try closing/reopening connections when we see TooManyProviderTokenUpdates, if only as a precautionary measure.

@jchambers
Copy link
Owner

This should be fixed in 0.15; if the problem persists after updating, please let us know and we'll re-open this issue.

@tschuchortdev
Copy link

@jchambers We've run into this issue again with version 0.15. About 4000 notifications failed with TooManyProviderTokenUpdates within a few seconds of each other. Is it possible that it was a user error and not a bug in the library? We only have one instance of ApnsClient per service and there was just a single replica running at the time with no deployments or anything like that. Number of concurrent connections, timeouts etc. are all set to default.

@jchambers
Copy link
Owner

Honestly, my suspicion at this point is that it's an upstream issue. Did the problem self-resolve for you?

@tschuchortdev
Copy link

I believe so, but it's hard to tell because the service is deployed and thus restarted frequently for various reasons anyway. For now we will add notifications that failed with this error to the retry queue that we already have and hopefully that should work around the problem. Perhaps it would make sense to document this phenomenon because the docs for ApnsClient.sendNotification explicitly say that non-exceptional errors (notification rejected by gateway) are not retryable.

@jchambers
Copy link
Owner

Perhaps it would make sense to document this phenomenon because the docs for ApnsClient.sendNotification explicitly say that non-exceptional errors (notification rejected by gateway) are not retryable.

That seems reasonable. Depending on what's up, it may also make sense to just automatically retry them. Let me see if I can learn a little more about what's going on here, then figure out what to do about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants