-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
APNs server does not respond to some notifications #816
Comments
Hi @jchambers , We are facing the same issue as stated above. Unfortunately, we do not set apnsId on our notification since we build them with Nevertheless, we have message-id in the payload to follow a given message sent in APNS (I can DM you some examples on twitter if needed). We've got 3k+ notification timeout (5 seconds timeout on the get) on one cluster last Wednesday and we managed to workaround the issue by re-initializing the connection as follow:
Our environment: pushy: 0.13.10, jdk: 1.8 Issues started beginning of this week and concerns all iOS users (IOS 12, 13 and 14). Thanks! |
Thank you for the offer, but I'm almost positive that in-payload IDs won't be searchable by Apple. If you have an opportunity to include a UUID in your outbound notifications, we can use that going forward: new SimpleApnsPushNotification(token, topic, payload, invalidationTime, priority, pushType, null, UUID.randomUUID()) If you get an opportunity to give that a go, please do let me know and we'll figure things out from there. Thanks! |
Hi @jchambers might this swift-server-community/APNSwift#93 be the same cause? |
It certainly sounds like it's the same problem, yes. |
@jchambers Also if i am setting up the timeout in future.get so after specified timeout we will receive the timeoutexception so will it be a solution to catch the exception and create new apns client and send the same notification again with this new client ? Please suggest and share your input. Thanks you . |
Hi @jchambers, Thank you for your reply. I will add the apns-id to the notification creation call. We have got some additional information to share. While the first encountered exception was a timeout :
We have found in another deployment a different error:
In fact, the service has been restarted in order to re-initialize the connection:
But we had to wait more than 30 seconds until the service was able to be stopped:
The provided trace Furhter restarts were more straight forward :
@jchambers Is it normal for client shutdown to be locked for more than half a minute ? Thanks! |
Hi @jchambers We're experiencing the same issue on our plateform. As a workaround we periodically re-initialize the connection but this is definitely not a viable solution. Are you still investigating on this ? Is there any news you want to share ? Thanks ! |
Yes, I'm still investigating the issue. Thank you for your patience. |
Hi @jchambers I have the same proplem when The apns server work one hour I can not get reponse from apns and the user device can not receive msg I use that to send msg new SimpleApnsPushNotification(token, topic, payload, invalidationTime, priority, pushType, null, UUID.randomUUID()) can not receive msg UUID
|
Hi @jchambers , As we added the timeout in future.get so we started getting the timeout exception and we are trying to catch the exception and in catch block we are trying to cancel the future task as shown in below code snippet. Sorry for the inconvinence. catch (Exception e) {
|
Pushy generally hasn't added support for canceling futures. Part of the problem is that once a notification has been written, it has consumed an HTTP/2 stream. Even if we cancel the future locally, streams are limited by the server, and we won't be able to "reclaim" the stream until the notification resolves remotely. |
Thank you for the explaination . Since we are continously facing this issue so Do you want to suggest some workaround till the time you are investigating on this if we have any ? |
hi @jchambers At present, we still face the following problems: If it is an asynchronous request, our APNs push service can only run for two hours. At present, we are ready to do a synchronous transaction every 10 minutes. If the transaction time-out, we will close the connection and reinitialize it. But we don't know whether this can solve the problem. Can you give us some suggestions? Thank you very much! |
Friends, I understand this is a serious problem for many of you. I promise I'll share updates as soon as they're available. |
@jchambers Did you open a radar issue for that? I opened FB8816555 but would reference your issue to point to a possible duplicate... |
I didn't; I've been working through other channels. |
Folks, I'll be putting out a new build shortly that includes some more verbose/specific logging that I hope will help get to the bottom of this issue. I don't think it will solve the problem in its own right, but it should help get us the information we need to make more progress. |
Folks, Pushy 0.14.2 has just been released and should be making its way to Maven Central within the next hour or so. Could you please update to the latest version? It includes a few logging changes that could be helpful in getting to the root of this problem. In particular, turning on Thank you! |
With thanks to @lkesteloot and @dcollens, we now have some high-quality frame logs showing:
Another curious observation is that this problem seems to affect a small subset of device tokens. As an example, most "stalled" notifications are headed for device tokens |
@jchambers we can confirm exactly that behavior. We already talked to Apple and they told us, that those request never hit their Application Layer and now their server team is investigating. |
@papo2608 Did Apple say when to fix it |
Looks like this issue is the same as #787 , we have been experiencing this on and off for the last 6 months at least. We resorted to expiring the future and recreate the ApnsClient when messages are not being sent. |
Folks, I've just received word that the problem may have been fixed upstream. Could you please check whether you're still experiencing this problem or if it appears to have been resolved and report back? Thanks! |
It seems so. Our platform was experiencing this issue since mid september, we had to reinitialize the client 3 or 4 times a day. But we just run 4 days in a row without any notification hitches. |
@jchambers , |
Yes, I mean that I believe Apple fixed a problem on their end. |
Is it possible to reproduce the same event(APNs server does not respond) using this mock server? Thanks! |
Respectfully, I think that's beyond the scope of the current issue and sounds more like a request for general support. @leiwei999 could you please move this discussion to the mailing list instead? |
Hi @jchambers, Jon we are currently in the version 0.13.9. We are facing the issue even now. We are re-starting our service periodically to make sure all the streams are not blocked.
Thanks In Advance Jon |
@babuv2 We opened FB8816555 issue with Apple for this. It is still open on Apple's side and it does not reference any duplicates. Updating the Pushy version is a good idea anyway but as for this issue, I don't think it will help. |
@petrdvorak Great. Understood. Thank you very much Petr Vivek |
Folks, If you're still experiencing this issue, please:
For recurrences of this issue, we need to be able to show which notifications are getting lost (a timestamp and APNs ID will cover this) and what else was happening on the connection at the time (the frame logs will cover this). Thank you very much for your continued patience and support! |
Hello, we too experienced lot of timeouts in 27, 28 and 30 October. Have already updated pushy to 0.14.2. Some apns-id: Time is in GMT+03. If we retry push, then we use the same apns-id. Hope that because of this there will be no problems to identfy them. Thanks, |
I was also experiencing this issue at my job. It seemed to increase with higher traffic. At the time, we were sending around 300 - 350 notifications per minute and our JVMs were locking up. Updating Pushy has seemed to resolve the issue but will monitor closely. |
Even though there are still some reports of occasional timeouts, it sounds like the main issue where HTTP/2 streams were getting lost entirely has been resolved upstream. I'm going to mark this issue as "resolved," but please let me know if you think that's a mistake. |
Thank you Jon for dealing with this! |
We have been seeing ton of timeouts today vs handful every day, we are still on an older release of Pushy (0.11.0), wonder if any one else experiencing these timeouts today? |
Yeah looks like we have about 25 stranded requests today (they never completed, and have consumed our Semaphore count). I don't have frame logs for them though. Around 1-2pm Eastern time roughly. |
Ok thanks, problem with upstream? For us, it started around 11:14am Eastern time and still happening |
Uuuuuuuugh… while I don't doubt that this is the same problem, frame logs and UUIDs of affected messages would be a big help in diagnosing this (even if that just means forwarding that information to Apple). I do hope this is just a brief hiccup that self-resolves, but if anybody's in a position to capture logs, that'd be awesome. In the meantime, I'll try to think through some appropriate resiliency strategies for "sometimes HTTP/2 streams just disappear." |
Thank you Jon, we have been running fine yesterday, returned to normal error rate. Do you recommend upgrading to newest version will help improve the situation? |
I always recommend using the latest version, but we haven't shipped anything that addresses this situation specifically. |
ok thanks |
@jchambers Jon we have
We are still facing the issue on a regular basis. Please find below some of the failed apns-ids 5eb1bdac-f632-43d7-9060-8b77473c5f64 Any help in debugging/fixing this is highly appreciated. Thanks In Advance Vivek |
Hi @jchambers, Thanks In Advance |
We've received reports that, starting on or around September 19, 2020, APNs servers have stopped responding to some notifications. From Pushy's perspective, this can look like a
Future
that never resolves (completion handlers are never called and calls to.get()
time out or wait forever). Please see the mailing list thread on this topic for additional background and discussion.From HTTP/2 frame logs, the problem appears to be that the server simply never sends a
HEADERS
(orDATA
) frame in response to a push notification and never closes the HTTP/2 stream associated with the notification.At this point, the goal is to identify some specific notifications affected by this problem. If you've encountered this issue, we're hoping to get the UUIDs (
apns-id
) and approximate timestamps of some affected notifications in the interest of sharing information upstream. Because the problem is that the server isn't responding, you'll need to assign your ownapns-id
values to outbound notifications (SimpleApnsNotification
has a pair of constructors that accept anapnsId
argument—usingUUID.randomUUID()
is recommended) to be able to uniquely identify which notifications are having this problem.This issue is intended to consolidate a number of other reports on this topic, including #807, #814, and #815.
The text was updated successfully, but these errors were encountered: