Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i3 freezes when ipc client becomes unresponsive #2280

Closed
cornerman opened this issue Apr 6, 2016 · 15 comments
Closed

i3 freezes when ipc client becomes unresponsive #2280

cornerman opened this issue Apr 6, 2016 · 15 comments
Labels
4.12 bug reproducible A bug that has been reviewed and confirmed by a project contributor

Comments

@cornerman
Copy link
Contributor

Output of i3 --moreversion 2>&- || i3 --version:

Binary i3 version: 4.12-21-g66d9c98 (2016-04-04, branch "next") © 2009 Michael Stapelberg and contributors
Running i3 version: 4.12-21-g66d9c98 (2016-04-04, branch "next") (pid 16730)
Loaded i3 config: /home/cornerman/.i3/config (Last modified: Wed 06 Apr 2016 01:41:05 PM CEST, 32 seconds ago)

The i3 binary you just called: /usr/bin/i3
The i3 binary you are running: i3

URL to a logfile as per http://i3wm.org/docs/debugging.html:

http://logs.i3wm.org/logs/5728116278296576.bz2

What I did:

Register an IPC event in the statusline generator. Then switch to fullscreen mode or hide i3bar, so the bar is hidden and the status command gets a SIGSTOP. Afterwards trigger the registered IPC event a few times.

What I saw:

i3 freezes while trying to send the ipc event to the stopped client, which does not read new events anymore. Here write returns an EAGAIN error because it would block with the full socket. Therefore, it will run in an infinite loop.

What I expected instead:

i3 should keep running and should not depend on the health of its IPC client. If sending an event fails, it should be ignored. Even though, one could argue that the statusline generator might not be the right place to register events, this happens for any client that becomes unresponsive.

@i3bot i3bot added the 4.12 label Apr 6, 2016
@Airblader
Copy link
Member

This is a duplicate of #1876, but we now have an actual case where this causes problems. And it sounds like a reasonable case to me. @stapelberg What do you think?

@Airblader Airblader added the bug label Apr 6, 2016
@stapelberg
Copy link
Member

One solution might be to re-architect the status line generator in such a way that the part which receives IPC events from i3 is in a separate process that is never SIGSTOP'ed. I’d recommend that for now, simply because it’ll work right away.

With regards to addressing this in i3, I’m not sure what all the possible ways to address the issue are. Some thoughts, feedback welcome:

  • Just stopping the write in the middle of a message (and messages do span multiple write() calls, at least on OpenBSD) does not sound like a good strategy, as clients are likely not prepared to disregard incomplete messages and the protocol was not designed to be self-synchronizing.
  • Shutting down the socket to allow i3 itself to make progress might not be a great idea either. Long-running tools such as time-tracking applications which log window titles/focus switch events might just be temporarily slow, but catch up once the scheduler on the loaded machine gets around to scheduling the time-tracking application again.
  • Buffering the data in i3 might be feasible but only seems like a stop-gap to me. It doesn’t make sense to have the buffer grow without bounds (might trigger the OOM killer) and when the buffer is full, we have the same problem as before.

Did I miss an option?

@cornerman
Copy link
Contributor Author

Yeah, fixing the statusline generator in this manner is definitely the way to go, as the event processing should not be stopped. Still, i3 should stay responsive.

You are right, the first two options would be pretty hard on the client. However, I think buffering all new events for each client would be overkill. How about only buffering the part of the last event message which could not be written completely? So, whenever an event should be dispatched to the client, we first try to send the rest of its incomplete message and only then proceed with sending new events. In this way, the client might miss some messages, but I guess that is okay.

@stapelberg
Copy link
Member

How about only buffering the part of the last event message which could not be written completely? So, whenever an event should be dispatched to the client, we first try to send the rest of its incomplete message and only then proceed with sending new events. In this way, the client might miss some messages, but I guess that is okay.

I agree that this suggestion is better than the points I brought up earlier. But, the crucial detail is dropping messages silently: as I mentioned above, clients which rely on getting all messages (think time trackers) would be broken by such a change.

@cornerman
Copy link
Contributor Author

True, but to me, dropping messages seems like the better option than blocking the whole wm. Maybe one could add a message signaling that events were missed, to make clients aware. But I am not sure whether this really helps.

Would it make sense to store the most recent message for each event type? So, the client at least knows the current state, which might help in some cases.

@Airblader
Copy link
Member

Can't we move the writing part of the IPC implementation into a separate process?

@cornerman
Copy link
Contributor Author

But wouldn't this lead to a similar problem? Either block all IPC writing or buffer the failed messages, which would ultimately lead to dropping messages.

@Airblader
Copy link
Member

We would be blocking, yes, but we wouldn't block the window manager.

Blocking seems reasonable to me. The client is stopped, afterall. If the client wants to do something despite being blocked, it should read messages in a separate process as well.

@cornerman
Copy link
Contributor Author

But there might be more clients, which would also be blocked.

@Airblader
Copy link
Member

Hm. Yeah, valid point.

@Airblader Airblader added the reproducible A bug that has been reviewed and confirmed by a project contributor label Apr 7, 2016
@stapelberg
Copy link
Member

I think there’s just no good option here. We’d need an option which is clearly better in order to justify changing the current behavior.

I’d recommend to either not SIGSTOP the process, or to implement a different policy for handling messages in a separate process which is not SIGSTOPped (think a proxy for i3ipc).

@pizdjuk
Copy link

pizdjuk commented Jan 30, 2021

I think I ran into this issue using altdesktop/i3ipc-python#174

The bug is triggered some times in a day

@orestisfl
Copy link
Member

What's your i3 version?

@pizdjuk
Copy link

pizdjuk commented Jan 30, 2021

i3 version 4.17.1 (2019-08-30)

@orestisfl
Copy link
Member

orestisfl commented Jan 31, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4.12 bug reproducible A bug that has been reviewed and confirmed by a project contributor
Projects
None yet
Development

No branches or pull requests

6 participants