-
-
Notifications
You must be signed in to change notification settings - Fork 21.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Websocket server crashes when client disconnects (50% of the time). #33279
Comments
I'd love to work on this myself if the regular networking contributors do not have the bandwidth, because the server crashing at disconnect is several orders worse than the client crashing. 😅 |
Here is the error log when 2 clients disconnected:
|
I'm unable to reproduce the crash using the websocket multiplayer demo, can you provide the stack trace? |
@Faless Excuse my ignorance, but where can I find the stack trace? It's not showing in any of the bottom consoles (I can only copy the error) and there's nothing more verbose in the commandline either. I tried having the Remote Debugger on and off, it didn't show anything else when the error happened. |
I enabled verbose stdout in the editor settings, it showed an extra line that wasn't before (see the **-ed lines).
|
I am able to reproduce with this project (on alpha 3 and master). I forgot to reduce the display size for the client before zipping, you might want to reset it to the default instead of 1920x1080. |
When trying to mass-reproduce it I discovered it is more likely to happen if the client has been connected for a few seconds, mashing Run -> Stop -> Run -> Stop is less likely to win. I suspect I might be getting superstitious though, it's pretty random 😄 |
Well, you said the server is crashing.
Can you confirm the server crashes (i.e. exits) when those errors happen? |
@Faless I'm running in the editor, that's where it crashes, as in it stops and goes back to the editor, and there is no stack trace. The process just stops/exits. I can confirm it is crashing/exiting with every fiber in my bones 😄 |
@Faless If it makes a difference it would only take me a few minutes to run it on AWS as an alpha3 server build. Also thank you for replying I really appreciate it. |
I'm still unable to reproduce this :(, tried connecting/disconnecting multiple times, with more than one client connected, waiting few seconds, still no crashes... |
@Faless Could it a platform issue? I'm developing on Win10 and rarely test on linux, so I'll take a few minutes to upload the pck to AWS and get back to you with the results. |
I'm not sure, but you mentioned that it happened also on Ubuntu 18.x so I thought you already had a way to test it on linux and reproduced the error there. |
@asheraryam The crash stacktrace isn't printed back to the editor, so you should start Godot directly in a terminal while in the project directory (it will run automatically): cd /path/to/project/folder
/path/to/godot/binary
# Or:
/path/to/godot/binary --path /path/to/project/folder |
Well the good news is that I got it to crash on Ubuntu, the bad news is that the extra text was
and then back to the OS. I used the pck should I use the project instead? Note that my linux testing environment is a server so I can't open the project in the editor on linux. |
Weird, and that is with which version of Godot? 3.2-alpha3? Are you using the release export template? Can you try using the debug export template instead?
No, it shouldn't matter... |
Alternatively, can you try running the server with gdb? I know this is suboptimal, but the integrated stack tracing function does not seem to work :( .
The (after gdb loads): When the server segfault it should bring you to the Should give you the stack trace... EDIT: I've also tested with the server platform, still unable to reproduce :'( |
Yes this is with 3.2alpha and using a debug build. @Faless I'll run gdb on the linux server and get back to you in a few minutes. Edit: Also to be fair, it's not as easy to reproduce as I initially thought, it usually happens after longer testing sessions, so when I spam run/close it's much less likely to happen. |
Finally something!
bt:
Edit: It doesn't look strictly related to networking, but I never had it with the client or anything else so it must be websockets server thing.. right? 😅 |
It can be related to this issue - #33290
|
@qarmin I'll try to do that but it might take longer than a few minutes since the aws instance doesn't have the tooling/repo cloned yet. |
Steps to reproduce:
I have got this backtrace:
|
@qarmin Thank you for your help! I'm maybe halfway done with the godot build so I might as well report back anyway just in case I get varying results. Edit: After around an hour of compiling the custom build on AWS refused to run the server, could be that the extra debug flags caused too much congestion on the processor. |
I think we are hitting a similar situation. At a glance I think it might be a race condition where a user disconnects in the middle of an RPC call. Upping the packet and buffer size seemed to help some but we still get it quite frequently if we have a fair number of players coming in and out. Here is our log and stack dump:
|
So maybe I found something that may help. Digging a bit into the websocket code (mainly lws_server in 3.1 and wsl_server in 3.2) I found what I think might be causing my error in the godot/modules/websocket/wsl_server.cpp Lines 285 to 289 in cc3b7d2
From this code you can see that an error check is being done to check if the passed peer id is in the godot/modules/websocket/wsl_server.cpp Lines 268 to 271 in cc3b7d2
If this one fails though NULL is returned. This means if anything modifies that map between the first and second check and removes that peer the Due to this I think the way to fix it is similar to the fix that was applied to the I do not know if this would fix the issue that @asheraryam is encountering but there seems to be a fair number of instances of This is all just a theory right now though so I could be wrong. I have the changes implemented for 3.1 and plan on testing it tomorrow so maybe I might be able to get more useful information then. |
Hi, I'm having this exact problem on a simple asteroids game I'm working with websockets. I have two server builds...
In windows, I have a GUI and i tested the server by connecting and disconnecting 1-6 clients on and off for about 10 minutes. No crashes or errors to report. In docker container, I run the same server scene and connect and disconnect 1-6 clients on and off for about 10 minutes. It randomly crashes on me with I've done as suggested above, and run
I am fairly sure it's not my code... but who knows? right??? If it is helpful, I'll try to boil it down to a more simple project than the one I'm working on, only if that's helpful to someone troubleshooting the problem. Please let me know! Thanks and much appreciation to those that know more than me and help fix these problems! Cheers... Your dev buddy... - ET PS - If there is a more "pro" way to debug this, please let me know, and I'll dump that data too! Thanks again! |
Yes, that would be really helpful. |
Demo Project - Platform Buddies!The sole purpose of this demo is to replicate the problem I'm having above. https://github.com/ETdoFresh/PlatformBuddies However, no luck just yet... I can't get the server to crash! Here are the main differences at this point in the demo...
|
OK, I've made a little more progress. I decided to grab a letsencrypt certificate for PlatformBuddies!. Now I'm starting to see a warning/error on disconnect.
Source: https://github.com/ETdoFresh/PlatformBuddies |
This doesn't seem related to the issue at hand. |
Agreed. This was just an observation on my journey.... Well... FYI, I'm really close. I decided to work my way backwards from my Asteroids project towards my PlatformBuddies project, and I fixed the websocket server issue. I'm in the process of trying to inject the code that causes the issue in my Asteroids project into Platform Buddies and I'll paste that up here when I can crash the server! :P Hopefully soon, but I am leaving for the next couple hours. Be back later! |
Woot! So good news! I replicated the problem! The culprit (in my case) was a script I downloaded to show "stats". There is a call to "weakref" and "call" during the connection process which I think causes the websocket server to crash [more info below]. TLDR: Looks like a questionable script that spits out stats caused errors during the WebSocket connection process. Here's the server log:
I have committed the bad code into this repository (v0.0.5): Here is the psuedo "trace" of the problem: var server = WebSocketServer.new()
server.connect("client_connected", self, "create_player") func create_player(id, _protocol):
var client = server.get_peer(id)
var input = NETWORK_INPUT.instance()
add_child(input)
var character = spawn_random_character(input)
## ERROR LINE BELOW: ------
$CanvasLayer/ServerStats.add_stat("X", input, "x", false) extends Panel
var stats = []
func add_stat(stat_name, object, stat_reference, is_method):
stats.append([stat_name, object, stat_reference, is_method])
func _process(_delta):
var label_text = ""
for stat in stats:
var value = null
if stat[1] and weakref(stat[1]).get_ref(): # MY GUESS IS THIS
if stat[3]:
value = stat[1].call(stat[2]) # OR THIS THAT IS CAUSING THE ISSUE
else:
value = stat[1].get(stat[2])
label_text += str(stat[0], ": ", value)
label_text += "\n"
$VBoxContainer/Value.text = label_text |
Yeah, the culprit doesn't seem related to the websocket implementation at all. Even from the stack trace, it seems a GDScript call to a freed object |
Godot version:
Latest 3.2alpha3 and master
OS/device including version:
Happens on Win10 and Ubuntu 18.x
Issue description:
I get the following error log, I suspect it could be because the server is trying to send an rpc call to a client that just left?
Steps to reproduce:
Occasionally (around 50% of the time), the server crashes when the client disconnects. I'm using the same code from the websockets demo projects from the demos repo so there is nothing custom going on in that area.
Minimal reproduction project:
The websockets projects from the demos repository.
The text was updated successfully, but these errors were encountered: