-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shannon connection is not re-established on failure #88
Comments
I have observed this myself too, it was working fine before, I think it's a regression introduced by #69. I did not debug this yet because I am not able to reproduce it consistenly, just happens randomly on my device. |
Can you describe how this is supposed to work? Then I might be able to develop a fix (I normally have a Spotifyd/librespot instance running 24/7 so I hit this frequently). |
There are two components called accesspoint (AP) and dealer that are persistent connections (TCP and Websocket). They both rely on a ping/pong (120 seconds for AP, 30 for dealer) mechanism to keep the server/client connected. The idea behind that code is that if no ping/pong is received then the connection is closed causing the recv loop to fail and start reconnecting. I think this is somewhat related to #48. |
Investigated the issue a bit more. Here the error message is generated, and the code that's supposed to close the connection: Lines 134 to 139 in b95e6ac
The
So apparently once the connection breaks it gets stuck in this channel receive operation? Lines 236 to 237 in b95e6ac
I'll add some debugging code to try to figure out why it gets stuck there. |
That's interesting, I've been looking at the wrong place for some time. I wonder where @aykevl Are you using |
No, I simply sent the process a SIGABRT when it got stuck and pulled out the stack trace of the right goroutine. |
This is the
So apparently it's stuck here: go-librespot/audio/provider.go Line 110 in 9e41b6c
|
Going further down the rabbit hole, here is the stack trace for the KeyProvider:
So it's just waiting in the I'm very much out of my depth here, but I hope this gives some ideas how to fix the issue. |
Surely there is potential for a bug in |
Thanks! I'll be trying this patch, see if it improves anything. I should know whether it helps in a day or so. |
The fix appears to be working! When I connected after a few hours I got "authenticated Login5 as (...)" which I guess means the connection was re-established. I'll monitor the daemon some more and will let you know if there's any issue. I normally run a Spotify daemon (go-librespot or otherwise) 24/7 and use it frequently, so I should easily find issues that only happen after running for a while. |
While the issue with the "deadlock" is fixed, I seem to be still having issues:
All seems good, but the device disappears from Spotify Connect. |
Yeah, I'm also still hitting problems. Just slightly different ones. Here is one:
Here is another:
I'm not exactly what happened the second time, but it seems to have triggered something on Spotify's side. When it automatically restarted the process, it could not authenticate anymore:
I had to reset my password to use Spotify again. Maybe unrelated, but it seems like a weird coincidence. |
Somewhat unrelated, but I got this error in my log at some point and found it interesting:
go-librespot was already kinda stuck due to previous errors so I don't know whether it would normally have recovered from this. |
I know one person that had to it, but in 5 years of messing with Spotify stuff I don't think it ever happened to me. Strange. Anyway, The deamon is a lot more stable for me. Do you mind if we close this and create new issues dedicated to each specific problem? @aykevl |
🤷♀️ As long as it doesn't keep happening.
It doesn't seem fixed to me? Or did I miss something? I also made a quick patch where I retried to send the message on timeout but the 2nd also timed out. So whatever the problem is, it doesn't fix itself after the timeout. |
Ah ok, didn't think there were still problems with AP specifically. Perhaps it's better to track AP and Dealer bugs separately. |
Maybe when the dealer connection breaks, it also needs to disconnect and reconnect the AP? |
I think I found the culprit for the double AP timeout: when reconnecting the AP would throw away all the registered receivers making the Should be fixed with 516de68 |
Yes! I was really close to figuring it out, you found it slightly earlier :) I'll test the patch and let you know whether it fixes the issue for me! |
So far holding steady (it would usually have errored out by now). |
Sometimes the Shannon connection (whatever that is) breaks, and go-librespot becomes unusable:
Here is another error:
(this time without the "did not receive last pong from dealer" messages, though it did start to spew those messages when I tried, and failed, to connect to the go-librespot instance).
I inserted some debugging code, and the stack trace where this error is generated in both cases looks like this:
Looking at the code (
AccessPoint.recvLoop
) it looks like it is intended to reestablish the connection when an error occurs, but clearly it doesn't for me.The text was updated successfully, but these errors were encountered: