Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random Crash When RTM Reading #118

Closed
1 task
KnightOfNight opened this issue Aug 24, 2016 · 19 comments
Closed
1 task

Random Crash When RTM Reading #118

KnightOfNight opened this issue Aug 24, 2016 · 19 comments

Comments

@KnightOfNight
Copy link

  • [X ] I've read and understood the Contributing guidelines and have done my best effort to follow them.
  • [ X] I've read and agree to the Code of Conduct.
  • [ X] I've searched for any related issues and avoided creating a duplicate issue.

Description

Slack client randomly disconnects with the following stack trace. I have tried adding a thread to my bot that pings on a regular basis, but that doesn't prevent the crash. I found what I thought was a related bug, but I'm not sure if it's the same issue or not. While this bug persists, bots don't stay running and have to be watched by a parent process that restarts it.

Traceback (most recent call last):
File "./vobot.py", line 163, in
mainloop()
File "./vobot.py", line 142, in mainloop
channel, message = read_output(SLACK_CLIENT.rtm_read())
File "/usr/local/lib/python2.7/dist-packages/slackclient/_client.py", line 120, in rtm_read
json_data = self.server.websocket_safe_read()
File "/usr/local/lib/python2.7/dist-packages/slackclient/_server.py", line 140, in websocket_safe_read
data += "{0}\n".format(self.websocket.recv())
File "/usr/local/lib/python2.7/dist-packages/websocket/_core.py", line 298, in recv
opcode, data = self.recv_data()
File "/usr/local/lib/python2.7/dist-packages/websocket/_core.py", line 315, in recv_data
opcode, frame = self.recv_data_frame(control_frame)
File "/usr/local/lib/python2.7/dist-packages/websocket/_core.py", line 328, in recv_data_frame
frame = self.recv_frame()
File "/usr/local/lib/python2.7/dist-packages/websocket/_core.py", line 360, in recv_frame
return self.frame_buffer.recv_frame()
File "/usr/local/lib/python2.7/dist-packages/websocket/_abnf.py", line 312, in recv_frame
self.recv_header()
File "/usr/local/lib/python2.7/dist-packages/websocket/_abnf.py", line 261, in recv_header
header = self.recv_strict(2)
File "/usr/local/lib/python2.7/dist-packages/websocket/_abnf.py", line 346, in recv_strict
bytes = self.recv(min(16384, shortage))
File "/usr/local/lib/python2.7/dist-packages/websocket/_core.py", line 429, in _recv
return recv(self.sock, bufsize)
File "/usr/local/lib/python2.7/dist-packages/websocket/_socket.py", line 89, in recv
raise WebSocketConnectionClosedException("Connection is already closed.")
websocket._exceptions.WebSocketConnectionClosedException: Connection is already closed.

Reproducible in:

  • This is reproducible in the sample project.
    python-slackclient version: 1.0.1
    Python Version: 2.7.6
    OS Version: Ubuntu 14.04.5 LTS

Steps to reproduce:

  1. connect to Slack using this library with a call to rtm_connect()
  2. read from Slack using rtm_read()
  3. wait for the crash to happen at a random time

Expected result:

I expect the bot to keep running.

Actual result:

The bot crashes.

Attachments:

@DKorytkin
Copy link

DKorytkin commented Aug 26, 2016

I have this problem too, added my traceback
python3.5.2 ubuntu16.04

Traceback (most recent call last):
  File "/home/dkorytkin/mario/slack_bot/main.py", line 117, in <module>
    req = sc.rtm_read()
  File "/usr/local/lib/python3.5/dist-packages/slackclient/_client.py", line 120, in rtm_read
    json_data = self.server.websocket_safe_read()
  File "/usr/local/lib/python3.5/dist-packages/slackclient/_server.py", line 140, in websocket_safe_read
    data += "{0}\n".format(self.websocket.recv())
  File "/usr/local/lib/python3.5/dist-packages/websocket/_core.py", line 298, in recv
    opcode, data = self.recv_data()
  File "/usr/local/lib/python3.5/dist-packages/websocket/_core.py", line 315, in recv_data
    opcode, frame = self.recv_data_frame(control_frame)
  File "/usr/local/lib/python3.5/dist-packages/websocket/_core.py", line 328, in recv_data_frame
    frame = self.recv_frame()
  File "/usr/local/lib/python3.5/dist-packages/websocket/_core.py", line 360, in recv_frame
    return self.frame_buffer.recv_frame()
  File "/usr/local/lib/python3.5/dist-packages/websocket/_abnf.py", line 312, in recv_frame
    self.recv_header()
  File "/usr/local/lib/python3.5/dist-packages/websocket/_abnf.py", line 261, in recv_header
    header = self.recv_strict(2)
  File "/usr/local/lib/python3.5/dist-packages/websocket/_abnf.py", line 346, in recv_strict
    bytes = self.recv(min(16384, shortage))
  File "/usr/local/lib/python3.5/dist-packages/websocket/_core.py", line 429, in _recv
    return recv(self.sock, bufsize)
  File "/usr/local/lib/python3.5/dist-packages/websocket/_socket.py", line 89, in recv
    raise WebSocketConnectionClosedException("Connection is already closed.")
websocket._exceptions.WebSocketConnectionClosedException: Connection is already closed.`

example my code

sc = SlackClient(SLACK_TOKEN)
if sc.rtm_connect():

    while True:
        req = sc.rtm_read()

@kyledouglas
Copy link

kyledouglas commented Sep 18, 2016

Traceback (most recent call last):
  File "rtmbot.py", line 28, in <module>
    bot.start()
  File "/var/scripts/devSFTP/douglask/visionBot-dev/rtmbot/core.py", line 78, in start
    self._start()
  File "/var/scripts/devSFTP/douglask/visionBot-dev/rtmbot/core.py", line 65, in _start
    for reply in self.slack_client.rtm_read():
  File "/usr/local/lib/python2.7/dist-packages/slackclient/_client.py", line 120, in rtm_read
    json_data = self.server.websocket_safe_read()
  File "/usr/local/lib/python2.7/dist-packages/slackclient/_server.py", line 140, in websocket_safe_read
    data += "{0}\n".format(self.websocket.recv())
  File "/usr/local/lib/python2.7/dist-packages/websocket/_core.py", line 298, in recv
    opcode, data = self.recv_data()
  File "/usr/local/lib/python2.7/dist-packages/websocket/_core.py", line 315, in recv_data
    opcode, frame = self.recv_data_frame(control_frame)
  File "/usr/local/lib/python2.7/dist-packages/websocket/_core.py", line 328, in recv_data_frame
    frame = self.recv_frame()
  File "/usr/local/lib/python2.7/dist-packages/websocket/_core.py", line 360, in recv_frame
    return self.frame_buffer.recv_frame()
  File "/usr/local/lib/python2.7/dist-packages/websocket/_abnf.py", line 312, in recv_frame
    self.recv_header()
  File "/usr/local/lib/python2.7/dist-packages/websocket/_abnf.py", line 261, in recv_header
    header = self.recv_strict(2)
  File "/usr/local/lib/python2.7/dist-packages/websocket/_abnf.py", line 346, in recv_strict
    bytes = self.recv(min(16384, shortage))
  File "/usr/local/lib/python2.7/dist-packages/websocket/_core.py", line 429, in _recv
    return recv(self.sock, bufsize)
  File "/usr/local/lib/python2.7/dist-packages/websocket/_socket.py", line 89, in recv
    raise WebSocketConnectionClosedException("Connection is already closed.")
websocket._exceptions.WebSocketConnectionClosedException: Connection is already closed.

Adding traceback as I am also seeing this issue when running against a list. Seems like a timeout is reached and will continue to dig. The problem is intermittent for me when testing but does cause stability issues for my bot

@ajferrick
Copy link
Contributor

wait for the crash to happen at a random time

I'm curious, do you notice it happening unpredictably, or does it tend to crash after staying idle for some time?

@cgreatorexatonshape
Copy link

In my testing I wasn't able to find any pattern. Sometimes it crashes soon after my read loop starts, other times it takes a number (as in dozens) of hours.

@TheKevJames
Copy link

TheKevJames commented Sep 27, 2016

Commenting since I've run into this issue on-and-off with thekevjames/jarvis since version 0.16 of this library at least. My personal suspicion is that it's an issue with the Slack websocket closing for some reason unrelated to the python library specifically, though I've only done a cursory look at this library.

As a workaround, I do:

while is_polling:
    try:
        for message in slack.rtm_read():
            # processing, etc
    except WebSocketConnectionClosedException:
        slack.rtm_connect()

@Roach
Copy link
Contributor

Roach commented Dec 9, 2016

This might seem unrelated, but how many members do your teams have?

I'm wondering if this is related to #127

@TheKevJames
Copy link

I've seen this occur on test teams with a single user (plus the bot).

@chrissphinx
Copy link

chrissphinx commented Jan 25, 2017

Traceback (most recent call last):
  File "bot.py", line 61, in <module>
    command, channel = parse_slack_output(slack_client.rtm_read())
  File "/home/hathor/bot/local/lib/python2.7/site-packages/slackclient/_client.py", line 120, in rtm_read
    json_data = self.server.websocket_safe_read()
  File "/home/hathor/bot/local/lib/python2.7/site-packages/slackclient/_server.py", line 138, in websocket_safe_read
    data += "{0}\n".format(self.websocket.recv())
  File "/home/hathor/bot/local/lib/python2.7/site-packages/websocket/_core.py", line 293, in recv
    opcode, data = self.recv_data()
  File "/home/hathor/bot/local/lib/python2.7/site-packages/websocket/_core.py", line 310, in recv_data
    opcode, frame = self.recv_data_frame(control_frame)
  File "/home/hathor/bot/local/lib/python2.7/site-packages/websocket/_core.py", line 323, in recv_data_frame
    frame = self.recv_frame()
  File "/home/hathor/bot/local/lib/python2.7/site-packages/websocket/_core.py", line 357, in recv_frame
    return self.frame_buffer.recv_frame()
  File "/home/hathor/bot/local/lib/python2.7/site-packages/websocket/_abnf.py", line 336, in recv_frame
    self.recv_header()
  File "/home/hathor/bot/local/lib/python2.7/site-packages/websocket/_abnf.py", line 286, in recv_header
    header = self.recv_strict(2)
  File "/home/hathor/bot/local/lib/python2.7/site-packages/websocket/_abnf.py", line 371, in recv_strict
    bytes_ = self.recv(min(16384, shortage))
  File "/home/hathor/bot/local/lib/python2.7/site-packages/websocket/_core.py", line 427, in _recv
    return recv(self.sock, bufsize)
  File "/home/hathor/bot/local/lib/python2.7/site-packages/websocket/_socket.py", line 93, in recv
    "Connection is already closed.")
websocket._exceptions.WebSocketConnectionClosedException: Connection is already closed.

I'm also experiencing this issue. @ajferrick it does seem to occur (for me, at least) when saying idle for some time. Also noticed the data it read just before crashing was: {}.

Any way to work around this for now?

@diamondsw
Copy link

diamondsw commented Feb 27, 2017

Afraid that I'm not adding anything other than "me too"; similar stack trace after being online for some unknown number of hours/days (less than 72 hours, at least). I'll attempt wrapping this in the exception handling listed above (I already had to add exception handling for socket.error to handle connection timeouts). This really seems like it should be more robust, or at the very least the documentation should cover what sort of exceptions we should expect from slack.rtm_read().

This may be silly, but is this related to gkbrk/hackchat#1 by any chance?

Traceback (most recent call last):
  File "./netsuitebot.py", line 142, in <module>
    if __name__ == "__main__":
  File "./netsuitebot.py", line 139, in main
    bot = SlackBot(args.token, args.name, args.file)
  File "./netsuitebot.py", line 66, in run
    command, channel = self.__parse_slack_output(self.slack.rtm_read())
  File "/home/jochs/netsuitebot/slackclient/local/lib/python2.7/site-packages/slackclient/_client.py", line 123, in rtm_read
    json_data = self.server.websocket_safe_read()
  File "/home/jochs/netsuitebot/slackclient/local/lib/python2.7/site-packages/slackclient/_server.py", line 141, in websocket_safe_read
    data += "{0}\n".format(self.websocket.recv())
  File "/home/jochs/netsuitebot/slackclient/local/lib/python2.7/site-packages/websocket/_core.py", line 293, in recv
    opcode, data = self.recv_data()
  File "/home/jochs/netsuitebot/slackclient/local/lib/python2.7/site-packages/websocket/_core.py", line 310, in recv_data
    opcode, frame = self.recv_data_frame(control_frame)
  File "/home/jochs/netsuitebot/slackclient/local/lib/python2.7/site-packages/websocket/_core.py", line 323, in recv_data_frame
    frame = self.recv_frame()
  File "/home/jochs/netsuitebot/slackclient/local/lib/python2.7/site-packages/websocket/_core.py", line 357, in recv_frame
    return self.frame_buffer.recv_frame()
  File "/home/jochs/netsuitebot/slackclient/local/lib/python2.7/site-packages/websocket/_abnf.py", line 336, in recv_frame
    self.recv_header()
  File "/home/jochs/netsuitebot/slackclient/local/lib/python2.7/site-packages/websocket/_abnf.py", line 286, in recv_header
    header = self.recv_strict(2)
  File "/home/jochs/netsuitebot/slackclient/local/lib/python2.7/site-packages/websocket/_abnf.py", line 371, in recv_strict
    bytes_ = self.recv(min(16384, shortage))
  File "/home/jochs/netsuitebot/slackclient/local/lib/python2.7/site-packages/websocket/_core.py", line 427, in _recv
    return recv(self.sock, bufsize)
  File "/home/jochs/netsuitebot/slackclient/local/lib/python2.7/site-packages/websocket/_socket.py", line 93, in recv
    "Connection is already closed.")
websocket._exceptions.WebSocketConnectionClosedException: Connection is already closed.

@TheKevJames
Copy link

@diamondsw I can't give you a definitive answer there, but from personal experience it seems that running a regular ping task doesn't seem to prevent this error. See my bot's keepalive task.

@phonkee
Copy link

phonkee commented Apr 4, 2017

Hi guys.
This is what I found (my bot connects really seldom) during my debugging.
When RTM (real time websocket) starts it sends rtm.start request first. In the response there is websocket url along with other information such as users (all users) etc...
In my case this response has cca 40 megabytes (yeah megabytes).
As per documentation:
The Websocket URLs provided by rtm.start are single-use and are only valid for 30 seconds, so make sure to connect quickly. If you connect successfully the first event received will be a hello:
The message is consumed and parsed usually not in this interval (30 seconds) so I got usually this response:
{u'type': u'error', u'error': {u'msg': u'Socket URL has expired', u'code': 1}}

Then my bot tries to reconnect and this scenario is usually repeating over and over.

@Ni-Knight
Copy link

We encounter it as well every once in a while :(

@aoberoi
Copy link
Contributor

aoberoi commented Jun 26, 2017

i think @phonkee's issue is distinct from the one being discussed here. that is related to very large teams having a lot more data in the rtm.start method's response, which can sometimes timeout because its so big. i've created this issue to address this problem with a new feature: #190.


we're going to need more data on the problem if we want to solve this issue. one thing we've gathered is potentially disconnects tend to occur when the bot is relatively idle as opposed to busy. i'd really appreciate any other patterns or data points you all have regarding this issue.

im creating a related issue for implementing better, deeper logging. if we had that before all these reports, we might have had the right data to diagnose and solve the issue already: #192

@SupermanIsEvious
Copy link

SupermanIsEvious commented Jul 7, 2017

warn_log_070617.txt
I receive websocket-related errors at least once per day. I'm attaching my error log here, along with the log entry preceding each error.

I've noticed that in the event preceding every WebSocketConnectionClosedException, rtm_read() returns an [{}], whereas the standard empty response is [].

Additionally, i should note that my bot currently runs rtm_read() every second. Sample code:

def _auto_reconnect(self, running):
    """Validates the connection boolean returned via `SlackClient.rtm_connect()`
    if running is False:
        * Attempt to auto reconnect a max of 5 times
        * For every attempt, delay reconnection (F_n)*5, where n = num of retries
    Parameters
    -----------
    running : bool
        The boolean value returned via `SlackClient.rtm_connect()`
    Returns
    --------
    running : bool
        The validated boolean value returned via `SlackClient.rtm_connect()`
    """
    while not running:
        if self._retries < self.slacker.settings.max_retries:
            self._retries += 1
            try:
                # delay for longer and longer each retry in case of extended outages
                current_delay = (self._retries + (self._retries - 1))*5 # fibonacci, bro
                self._logger.info(
                    "Attempting reconnection %s of %s in %s seconds...",
                    self._retries,
                    self.slacker.settings.max_retries,
                    current_delay
                )
                sleep(current_delay)
                running = self.slacker.rtm_api.rtm_connect()
            except KeyboardInterrupt:
                self._logger.info("KeyboardInterrupt received.")
                break
        else:
            self._logger.error("Max retries exceeded")
            break
    return running

def run(self):
    """Passes the `SlackClient.rtm_connect()` method into `self._auto_reconnect()` for validation
    if running:
        * Capture and parse events via `Slacker.process_events()` every second
        * Close gracefully on KeyboardInterrupt (for testing)
        * log any Exceptions and try to reconnect if needed
    Parameters
    -----------
    n/a
    Returns
    --------
    n/a
    """
    running = self._auto_reconnect(self.slacker.rtm_api.rtm_connect())
    while running:
        self._logger.debug(
            "Slack connection successful using token <%s>. Response: %s",
            self.slacker.rtm_api.slacktoken,
            running
            )
        try:
            self.slacker.process_events()
            sleep(self.slacker.settings.sleep_secs)
        except KeyboardInterrupt:
            self._logger.info("KeyboardInterrupt received.")
            running = False
        except Exception as e:
            self._logger.exception(e)
            running = self._auto_reconnect(self.slacker.rtm_api.rtm_connect())

@aoberoi
Copy link
Contributor

aoberoi commented Aug 2, 2017

@SupermanIsEvious this data is super useful, thanks so much for sharing!

i haven't yet been able to reproduce this myself, but i haven't let my connection stay alive for more than a couple hours. i suspect that we'd have to leave it on for a day to reproduce.

i don't have a lot of bandwidth at the moment for investigating further (but i agree its imp and will get to it eventually) so if anyone else wants to help solve this, i'd welcome any contribution!

@Gus-Corvil
Copy link

Gus-Corvil commented Oct 13, 2017

This happens to my bot on many early mornings, so after a few quiet hours

@micolous
Copy link

micolous commented Dec 3, 2017

I've observed that some Slack teams have (recently) been opted into an experiment (by SlackHQ) which requires that ping messages are sent by RTM clients, otherwise they will be disconnected after 2-3 minutes. This is more frequently than the other reporters, and has only become an issue in the last 4 days for me.

This issue does not appear to be a connectivity issue (observed from multiple ASNs in different continents), is not limited in scope to one API implementation, and disconnections still occur even when there is a constant stream of events coming from the server.

However, this experiment has only been applied to a limited number of teams (I see only one team with the issue and all others without, even when running the same bot code), which don't appear to correlate with age of the team.

There is an existing Server.ping method, however this is not called by any of the code within slackclient itself. Additionally, the example code for using RTM neither mentions pinging as a requirement, nor includes the sufficient boilerplate to implement this.

My feeling is that the API surface should be simple to use, so python-slackclient should implement the pinging rather than individual implementors requiring additional, currently undocumented boilerplate.

I've fixed this in my own Slack client implementation with micolous/python-slackrealtime@e9c9441

@Qix-
Copy link

Qix- commented Dec 30, 2017

I receive this message instantly upon sending anything using sc.server.send_to_websocket().

For example:

def send_typing_indicator(sc, channel):
    import json
    sc.server.send_to_websocket(json.dumps({'type': 'typing', 'channel': channel}))

Untested code!

Which ultimately causes the exception to be thrown on the next events read. Maybe that will help in debugging this.

@Roach
Copy link
Contributor

Roach commented Mar 30, 2018

Fixed in 1.2.0 using client.rtm_connect(auto_reconnect=True)

See the note at the bottom of this section: https://slackapi.github.io/python-slackclient/real_time_messaging.html#rtm-start-vs-rtm-connect

@Roach Roach closed this as completed Mar 30, 2018
nikanar pushed a commit to nikanar/ACCM-bot that referenced this issue Aug 15, 2018
trygveaa added a commit to wee-slack/wee-slack that referenced this issue Mar 13, 2019
Thanks to @celsworth and @micolous for pointing out that Slack now
requires this for some teams.

From slackapi/python-slack-sdk#118 (comment):

    I've observed that some Slack teams have (recently) been opted into
    an experiment (by SlackHQ) which requires that ping messages are
    sent by RTM clients, otherwise they will be disconnected after 2-3
    minutes. This is more frequently than the other reporters, and has
    only become an issue in the last 4 days for me.

    However, this experiment has only been applied to a limited number
    of teams (I see only one team with the issue and all others without,
    even when running the same bot code), which don't appear to
    correlate with age of the team.

May partly fix #679, but that seems to also be caused by an issue in
websocket-client.
ericswpark added a commit to ericswpark/python_transporter that referenced this issue Mar 29, 2019
The websocket connection drops because of inactivity. When it does, the slackclient library does not automatically attempt to reconnect without explicitly being told to do so. When we try to send a message while the socket is closed, we get an error and the script crashes.

Pages related to this issue:
slackapi/python-slack-sdk#118
https://slack.dev/python-slackclient/real_time_messaging.html#rtm-start-vs-rtm-connect

Possible auto-ping solution required:
https://github.com/TheKevJames/jarvis/blob/master/bot/jarvis/core/async.py#L39-L49

-----------------------------

Turns out, the websocket-client library has a fault with 0.55.0. Downgrading to 0.54.0 *should* fix the issue, but I already have 0.54.0 on my local machine, so that's not the problem.

Issues:
slackapi/python-slack-sdk#385
slackapi/python-slack-sdk#127
slackapi/python-slack-sdk#387
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests