-
Notifications
You must be signed in to change notification settings - Fork 835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Random Crash When RTM Reading #118
Comments
I have this problem too, added my traceback
example my code
|
Adding traceback as I am also seeing this issue when running against a list. Seems like a timeout is reached and will continue to dig. The problem is intermittent for me when testing but does cause stability issues for my bot |
I'm curious, do you notice it happening unpredictably, or does it tend to crash after staying idle for some time? |
In my testing I wasn't able to find any pattern. Sometimes it crashes soon after my read loop starts, other times it takes a number (as in dozens) of hours. |
Commenting since I've run into this issue on-and-off with thekevjames/jarvis since version 0.16 of this library at least. My personal suspicion is that it's an issue with the Slack websocket closing for some reason unrelated to the python library specifically, though I've only done a cursory look at this library. As a workaround, I do:
|
Workaround fix for reillysiemens#1. Based on slackapi/python-slack-sdk#118 (comment)
This might seem unrelated, but how many members do your teams have? I'm wondering if this is related to #127 |
I've seen this occur on test teams with a single user (plus the bot). |
I'm also experiencing this issue. @ajferrick it does seem to occur (for me, at least) when saying idle for some time. Also noticed the data it read just before crashing was: Any way to work around this for now? |
Afraid that I'm not adding anything other than "me too"; similar stack trace after being online for some unknown number of hours/days (less than 72 hours, at least). I'll attempt wrapping this in the exception handling listed above (I already had to add exception handling for socket.error to handle connection timeouts). This really seems like it should be more robust, or at the very least the documentation should cover what sort of exceptions we should expect from This may be silly, but is this related to gkbrk/hackchat#1 by any chance?
|
@diamondsw I can't give you a definitive answer there, but from personal experience it seems that running a regular ping task doesn't seem to prevent this error. See my bot's keepalive task. |
Hi guys. Then my bot tries to reconnect and this scenario is usually repeating over and over. |
We encounter it as well every once in a while :( |
i think @phonkee's issue is distinct from the one being discussed here. that is related to very large teams having a lot more data in the we're going to need more data on the problem if we want to solve this issue. one thing we've gathered is potentially disconnects tend to occur when the bot is relatively idle as opposed to busy. i'd really appreciate any other patterns or data points you all have regarding this issue. im creating a related issue for implementing better, deeper logging. if we had that before all these reports, we might have had the right data to diagnose and solve the issue already: #192 |
warn_log_070617.txt I've noticed that in the event preceding every Additionally, i should note that my bot currently runs def _auto_reconnect(self, running):
"""Validates the connection boolean returned via `SlackClient.rtm_connect()`
if running is False:
* Attempt to auto reconnect a max of 5 times
* For every attempt, delay reconnection (F_n)*5, where n = num of retries
Parameters
-----------
running : bool
The boolean value returned via `SlackClient.rtm_connect()`
Returns
--------
running : bool
The validated boolean value returned via `SlackClient.rtm_connect()`
"""
while not running:
if self._retries < self.slacker.settings.max_retries:
self._retries += 1
try:
# delay for longer and longer each retry in case of extended outages
current_delay = (self._retries + (self._retries - 1))*5 # fibonacci, bro
self._logger.info(
"Attempting reconnection %s of %s in %s seconds...",
self._retries,
self.slacker.settings.max_retries,
current_delay
)
sleep(current_delay)
running = self.slacker.rtm_api.rtm_connect()
except KeyboardInterrupt:
self._logger.info("KeyboardInterrupt received.")
break
else:
self._logger.error("Max retries exceeded")
break
return running
def run(self):
"""Passes the `SlackClient.rtm_connect()` method into `self._auto_reconnect()` for validation
if running:
* Capture and parse events via `Slacker.process_events()` every second
* Close gracefully on KeyboardInterrupt (for testing)
* log any Exceptions and try to reconnect if needed
Parameters
-----------
n/a
Returns
--------
n/a
"""
running = self._auto_reconnect(self.slacker.rtm_api.rtm_connect())
while running:
self._logger.debug(
"Slack connection successful using token <%s>. Response: %s",
self.slacker.rtm_api.slacktoken,
running
)
try:
self.slacker.process_events()
sleep(self.slacker.settings.sleep_secs)
except KeyboardInterrupt:
self._logger.info("KeyboardInterrupt received.")
running = False
except Exception as e:
self._logger.exception(e)
running = self._auto_reconnect(self.slacker.rtm_api.rtm_connect()) |
@SupermanIsEvious this data is super useful, thanks so much for sharing! i haven't yet been able to reproduce this myself, but i haven't let my connection stay alive for more than a couple hours. i suspect that we'd have to leave it on for a day to reproduce. i don't have a lot of bandwidth at the moment for investigating further (but i agree its imp and will get to it eventually) so if anyone else wants to help solve this, i'd welcome any contribution! |
This happens to my bot on many early mornings, so after a few quiet hours |
I've observed that some Slack teams have (recently) been opted into an experiment (by SlackHQ) which requires that This issue does not appear to be a connectivity issue (observed from multiple ASNs in different continents), is not limited in scope to one API implementation, and disconnections still occur even when there is a constant stream of events coming from the server. However, this experiment has only been applied to a limited number of teams (I see only one team with the issue and all others without, even when running the same bot code), which don't appear to correlate with age of the team. There is an existing My feeling is that the API surface should be simple to use, so I've fixed this in my own Slack client implementation with micolous/python-slackrealtime@e9c9441 |
I receive this message instantly upon sending anything using For example: def send_typing_indicator(sc, channel):
import json
sc.server.send_to_websocket(json.dumps({'type': 'typing', 'channel': channel})) Untested code! Which ultimately causes the exception to be thrown on the next events read. Maybe that will help in debugging this. |
Fixed in 1.2.0 using See the note at the bottom of this section: https://slackapi.github.io/python-slackclient/real_time_messaging.html#rtm-start-vs-rtm-connect |
Thanks to @celsworth and @micolous for pointing out that Slack now requires this for some teams. From slackapi/python-slack-sdk#118 (comment): I've observed that some Slack teams have (recently) been opted into an experiment (by SlackHQ) which requires that ping messages are sent by RTM clients, otherwise they will be disconnected after 2-3 minutes. This is more frequently than the other reporters, and has only become an issue in the last 4 days for me. However, this experiment has only been applied to a limited number of teams (I see only one team with the issue and all others without, even when running the same bot code), which don't appear to correlate with age of the team. May partly fix #679, but that seems to also be caused by an issue in websocket-client.
The websocket connection drops because of inactivity. When it does, the slackclient library does not automatically attempt to reconnect without explicitly being told to do so. When we try to send a message while the socket is closed, we get an error and the script crashes. Pages related to this issue: slackapi/python-slack-sdk#118 https://slack.dev/python-slackclient/real_time_messaging.html#rtm-start-vs-rtm-connect Possible auto-ping solution required: https://github.com/TheKevJames/jarvis/blob/master/bot/jarvis/core/async.py#L39-L49 ----------------------------- Turns out, the websocket-client library has a fault with 0.55.0. Downgrading to 0.54.0 *should* fix the issue, but I already have 0.54.0 on my local machine, so that's not the problem. Issues: slackapi/python-slack-sdk#385 slackapi/python-slack-sdk#127 slackapi/python-slack-sdk#387
Description
Slack client randomly disconnects with the following stack trace. I have tried adding a thread to my bot that pings on a regular basis, but that doesn't prevent the crash. I found what I thought was a related bug, but I'm not sure if it's the same issue or not. While this bug persists, bots don't stay running and have to be watched by a parent process that restarts it.
Traceback (most recent call last):
File "./vobot.py", line 163, in
mainloop()
File "./vobot.py", line 142, in mainloop
channel, message = read_output(SLACK_CLIENT.rtm_read())
File "/usr/local/lib/python2.7/dist-packages/slackclient/_client.py", line 120, in rtm_read
json_data = self.server.websocket_safe_read()
File "/usr/local/lib/python2.7/dist-packages/slackclient/_server.py", line 140, in websocket_safe_read
data += "{0}\n".format(self.websocket.recv())
File "/usr/local/lib/python2.7/dist-packages/websocket/_core.py", line 298, in recv
opcode, data = self.recv_data()
File "/usr/local/lib/python2.7/dist-packages/websocket/_core.py", line 315, in recv_data
opcode, frame = self.recv_data_frame(control_frame)
File "/usr/local/lib/python2.7/dist-packages/websocket/_core.py", line 328, in recv_data_frame
frame = self.recv_frame()
File "/usr/local/lib/python2.7/dist-packages/websocket/_core.py", line 360, in recv_frame
return self.frame_buffer.recv_frame()
File "/usr/local/lib/python2.7/dist-packages/websocket/_abnf.py", line 312, in recv_frame
self.recv_header()
File "/usr/local/lib/python2.7/dist-packages/websocket/_abnf.py", line 261, in recv_header
header = self.recv_strict(2)
File "/usr/local/lib/python2.7/dist-packages/websocket/_abnf.py", line 346, in recv_strict
bytes = self.recv(min(16384, shortage))
File "/usr/local/lib/python2.7/dist-packages/websocket/_core.py", line 429, in _recv
return recv(self.sock, bufsize)
File "/usr/local/lib/python2.7/dist-packages/websocket/_socket.py", line 89, in recv
raise WebSocketConnectionClosedException("Connection is already closed.")
websocket._exceptions.WebSocketConnectionClosedException: Connection is already closed.
Reproducible in:
python-slackclient version: 1.0.1
Python Version: 2.7.6
OS Version: Ubuntu 14.04.5 LTS
Steps to reproduce:
Expected result:
I expect the bot to keep running.
Actual result:
The bot crashes.
Attachments:
The text was updated successfully, but these errors were encountered: