Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WebSocketManager disconnects 30-60 minutes at a time #8558

Closed
adam-a opened this issue Aug 27, 2022 · 15 comments · Fixed by #8887
Closed

WebSocketManager disconnects 30-60 minutes at a time #8558

adam-a opened this issue Aug 27, 2022 · 15 comments · Fixed by #8887

Comments

@adam-a
Copy link

adam-a commented Aug 27, 2022

Which package is this bug report for?

ws

Issue description

Running WebSocketManager for a bot on my own personal Discord server

Tested this locally and on a staging within an EC2 instance -- the same issue applies

Randomly throughout the day this will occur and could go on for upwards of 30-60 minutes.

How can I ensure no/minimal disconnection? (Code sample below -- very straightforward implementation)

Connecting to [wss://gateway.discord.gg?v=10&encoding=json](wss://gateway.discord.gg/?v=10&encoding=json)
info:    Debug
Waiting for event hello for 60000ms
info:    Heartbeat
info:    Debug
Starting to heartbeat every 41250ms
info:    Debug
Identifying
        shard id: 0
        shard count: 1
        intents: 34305
        compression: none
info:    Debug
Disconnected normally from code 1000
info:    Debug
Destroying shard
        Reason: Got disconnected by Discord
        Code: 1000
        Recover: Reconnect
info:    Debug
Connecting to [wss://gateway.discord.gg?v=10&encoding=json](wss://gateway.discord.gg/?v=10&encoding=json)
info:    Debug
Waiting for event hello for 60000ms
info:    Heartbeat
info:    Debug
Starting to heartbeat every 41250ms
info:    Debug
Identifying
        shard id: 0
        shard count: 1
        intents: 34305
        compression: none
info:    Debug
Disconnected normally from code 1000
info:    Debug
Destroying shard
        Reason: Got disconnected by Discord
        Code: 1000
        Recover: Reconnect
info:    Debug
Connecting to [wss://gateway.discord.gg?v=10&encoding=json](wss://gateway.discord.gg/?v=10&encoding=json)
info:    Debug
Waiting for event hello for 60000ms
info:    Heartbeat
info:    Debug
Starting to heartbeat every 41250ms
info:    Debug
Identifying
        shard id: 0
        shard count: 1
        intents: 34305
        compression: none
info:    Debug
Disconnected normally from code 1000
info:    Debug
Destroying shard
        Reason: Got disconnected by Discord
        Code: 1000
        Recover: Reconnect
info:    Debug
Connecting to [wss://gateway.discord.gg?v=10&encoding=json](wss://gateway.discord.gg/?v=10&encoding=json)
info:    Debug
Waiting for event hello for 60000ms
info:    Heartbeat
info:    Debug
Starting to heartbeat every 41250ms
info:    Debug
Identifying
        shard id: 0
        shard count: 1
        intents: 34305
        compression: none

Code sample

if (!authToken) throw Error('BOT AuthToken not set');

  const rest = new REST().setToken(authToken);

  const manager = new WebSocketManager({
    handshakeTimeout: 20000,
    token: authToken,
    intents,
    rest,
  });

  manager.setStrategy(new SimpleShardingStrategy(manager));

  manager.addListener(WebSocketShardEvents.Debug, async (payload: { message: string }) =>
    logger.info(`Debug\n${payload.message}`)
  );

  manager.addListener(WebSocketShardEvents.Ready, () => logger.info('Ready'));

  manager.addListener(WebSocketShardEvents.Hello, () => logger.info('Heartbeat'));

  manager.addListener(WebSocketShardEvents.Resumed, () => logger.info('Resumed'));

  manager.addListener(
    WebSocketShardEvents.Dispatch,
    (
      event: {
        data: GatewayDispatchPayload;
      } & {
        shardId: number;
      }
    ) => logEvent(event.data)
  );

  await manager.connect();

Package version

14.3.0

Node.js version

18.6.0

Operating system

No response

Priority this issue should have

High (immediate attention needed)

Which partials do you have configured?

Message, Reaction

Which gateway intents are you subscribing to?

Guilds, GuildMessages, GuildMessageReactions, MessageContent

I have tested this issue on a development release

No response

@didinele
Copy link
Member

What version of @discordjs/ws are you on?

@adam-a
Copy link
Author

adam-a commented Aug 27, 2022

What version of @discordjs/ws are you on?

Hi! Thanks for responding.

I'm using 0.3.0.

It's driving me nuts. Even overrode updateSessionInfo and retrieveSessionInfo at some point to keep track of session info since it seems like it is being utilized in https://github.com/didinele/discord.js/blob/874ee5cee006edad6d7fe138d5dbe83f96d3f0e5/packages/ws/src/ws/WebSocketShard.ts#L105 (check line 128)

But that did not work as well.

@didinele
Copy link
Member

didinele commented Aug 27, 2022

Sorta familiar with this issue. We've been trying to fix it for a while and across multiple PRs. You're not experiencing Discord constantly telling you to reconnect (at least to my understanding), but rather a reconnect loop bug.

My latest attempt at patching this up was #8479, but seems like that wasn't enough.

cc @SpaceEEC

@adam-a
Copy link
Author

adam-a commented Aug 27, 2022

Sorta familiar with this issue. We've been trying to fix it for a while and across multiple PRs. You're not experiencing Discord constantly telling you to reconnect (at least to my understanding), but rather a reconnect loop bug.

My latest attempt at patching this up was #8479, but seems like that wasn't enough.

cc @SpaceEEC

Do you think it's an issue in the underlying WebSocket package?

I implemented my bot with the standard WebSocket package before finding discord.js, and I had the same issue.

Also, side note, today I noticed this:

Waiting for event hello for 60000ms
info:    Heartbeat
info:    Debug
Starting to heartbeat every 41250ms
info:    Debug
Resuming session
info:    Debug
Resuming session
/Users/some/Documents/Github/some/shared_services/node_modules/ws/lib/websocket.js:441
      throw new Error('WebSocket is not open: readyState 0 (CONNECTING)');
            ^
Error: WebSocket is not open: readyState 0 (CONNECTING)
    at WebSocket.send (/Users/some/Documents/Github/some/shared_services/node_modules/ws/lib/websocket.js:441:13)
    at WebSocketShard.send (/Users/some/Documents/Github/some/shared_services/node_modules/@discordjs/ws/dist/ws/WebSocketShard.cjs:168:21)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async WebSocketShard.connect (/Users/some/Documents/Github/some/shared_services/node_modules/@discordjs/ws/dist/ws/WebSocketShard.cjs:97:7)
    at async WebSocketShard.onMessage (/Users/some/Documents/Github/some/shared_services/node_modules/@discordjs/ws/dist/ws/WebSocketShard.cjs:322:11)
[nodemon] app crashed - waiting for file changes before starting...

@didinele
Copy link
Member

Do you think it's an issue in the underlying WebSocket package?

No.

@adam-a
Copy link
Author

adam-a commented Aug 27, 2022

Do you think it's an issue in the underlying WebSocket package?

No.

I'd like to clone discord.js and assist in this effort. What's a good starting point for me particularly for this recurring loop bug?

@didinele
Copy link
Member

packages/ws/src/ws/WebSocketShard.ts - mainly the onClose handler and the destroy method.

@Dossar
Copy link

Dossar commented Oct 7, 2022

Sorta familiar with this issue. We've been trying to fix it for a while and across multiple PRs. You're not experiencing Discord constantly telling you to reconnect (at least to my understanding), but rather a reconnect loop bug.

My latest attempt at patching this up was #8479, but seems like that wasn't enough.

cc @SpaceEEC

Hello, it looks like this thread might be the same root cause for bots disconnecting after an hour or earlier:
#8486 (comment)

To summarize the findings of what I saw in that thread,

The common data point seems to be that somewhere between v13.8.1 and v13.10.3, something introduced this issue of bots disconnecting from the client. And then that ended up getting into v14.x

@legendhimself
Copy link
Contributor

legendhimself commented Oct 7, 2022

might be the same root cause for bots disconnecting after an hour or earlier:
#8486 (comment)

@Dossar
this is an issue from the new WS package and it has nothing to do with those issues since even in the latest versions DJS still does not make use of the WS package for creating the WebSockets or connecting to the discord gateway.

@didinele
Copy link
Member

Hi! Are you still able to reproduce this issue on 0.4.1?

@adam-a
Copy link
Author

adam-a commented Nov 2, 2022

Hi! Are you still able to reproduce this issue on 0.4.1?

Hi there! I just updated it and will be watching it. Will lyk! Thank you for the fix.

@adam-a
Copy link
Author

adam-a commented Nov 3, 2022

@didinele Unfortunately, the issue persists with 0.4.1 :/

Nov 3, 2022 @ 11:54:13.921 | Debug Identifying 	shard id: 0 	shard count: 1 	intents: 34305 	compression: none
-- | --

  | Nov 3, 2022 @ 11:54:13.920 | Heartbeat

  | Nov 3, 2022 @ 11:54:13.920 | Debug Starting to heartbeat every 41250ms

  | Nov 3, 2022 @ 11:54:13.757 | Debug Waiting for event hello for 60000ms

  | Nov 3, 2022 @ 11:54:13.756 | Debug Connecting to wss://gateway.discord.gg?v=10&encoding=json

  | Nov 3, 2022 @ 11:54:13.755 | Debug Destroying shard 	Reason: Got disconnected by Discord 	Code: 1000 	Recover: Reconnect

  | Nov 3, 2022 @ 11:54:13.755 | Debug Disconnected normally from code 1000

  | Nov 3, 2022 @ 11:53:18.760 | Debug Starting to heartbeat every 41250ms

  | Nov 3, 2022 @ 11:53:18.760 | Debug Identifying 	shard id: 0 	shard count: 1 	intents: 34305 	compression: none

  | Nov 3, 2022 @ 11:53:18.760 | Heartbeat

  | Nov 3, 2022 @ 11:53:18.604 | Debug Waiting for event hello for 60000ms

  | Nov 3, 2022 @ 11:53:18.602 | Debug Disconnected normally from code 1000

  | Nov 3, 2022 @ 11:53:18.602 | Debug Destroying shard 	Reason: Got disconnected by Discord 	Code: 1000 	Recover: Reconnect

  | Nov 3, 2022 @ 11:53:18.602 | Debug Connecting to wss://gateway.discord.gg?v=10&encoding=json

  | Nov 3, 2022 @ 11:53:09.169 | HTTP Request

  | Nov 3, 2022 @ 11:52:23.398 | Debug Starting to heartbeat every 41250ms

  | Nov 3, 2022 @ 11:52:23.398 | Debug Identifying 	shard id: 0 	shard count: 1 	intents: 34305 	compression: none

  | Nov 3, 2022 @ 11:52:23.397 | Heartbeat

  | Nov 3, 2022 @ 11:52:23.247 | Debug Waiting for event hello for 60000ms

  | Nov 3, 2022 @ 11:52:23.246 | Debug Connecting to wss://gateway.discord.gg?v=10&encoding=json

  | Nov 3, 2022 @ 11:52:23.245 | Debug Disconnected normally from code 1000

  | Nov 3, 2022 @ 11:52:23.245 | Debug Destroying shard 	Reason: Got disconnected by Discord 	Code: 1000 	Recover: Reconnect

  | Nov 3, 2022 @ 11:52:01.024 | HTTP Request

@didinele
Copy link
Member

didinele commented Nov 30, 2022

After another month of on-and-off debugging and a lot of frustration with being generally unable to reproduce this issue in a consistent matter, I've finally had a breakthrough and tracked it down to be how we handle the send ratelimit - the issue boils down to some reversed logic that was missed during review, essentially meaning that after ~120 heartbeats we think we're about to get ratelimited and block the entire send queue for... a lot of time (could be something like ~20 minutes in some cases).

WS connections auto-magically close with a 1000 close code if enough time passes with no flowing payloads. The queue stays stuck after we reconnect, so our IDENTIFY never goes through, once again, causing the connection to eventually close due to inactivity (causing the "reconnect loop" that we see) - eventually the sleep call ends and we successfully reconnect and start the cycle over again.

I'll have a PR fixing this later today, appreciate the flag and the effort in providing me with logs!

@JMTK
Copy link
Contributor

JMTK commented Nov 30, 2022

@didinele would this also alleviate issues related to #8486 / #4095 ?

@didinele
Copy link
Member

@didinele would this also alleviate issues related to #8486 / #4095 ?

discord.js (mainlib) does not currently use @discordjs/ws, it has its own own WS implementation (the "old" one), so no, those changes do not help at all at this time, but they will potentially will in the future when the main library starts using the ws package.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants