Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix several crashes on Windows by heavily simplifying the event loop code #1496

Merged
merged 20 commits into from
May 4, 2020

Conversation

Osspial
Copy link
Contributor

@Osspial Osspial commented Mar 6, 2020

  • Tested on all platforms changed
  • Compilation warnings were addressed
  • cargo fmt has been run on this branch
  • cargo doc builds successfully
  • Added an entry to CHANGELOG.md if knowledge of this change could be valuable to users

I'll do a more detailed writeup on the rationale behind these changes when I'm more awake, but the TL;DR is that it drastically simplifies Winit's event loop model by always using the modal event loop. This PR's fixes are an alternate approach to the problems described in #1429 that simplifies the event model more than #1461 does, but it could not exist in its current form without @filnet's work and feedback in those threads.

Fixes #1477, fixes #1496.

@filnet
Copy link
Contributor

filnet commented Mar 6, 2020

Did some testing with the control_flow example.

When using Wait control flow and calling request_redraw from MainEventsCleared:

NewEvents(Init)
NewEvents(Init)
DeviceEvent { device_id: DeviceId(DeviceId(131227)), event: Added }
DeviceEvent { device_id: DeviceId(DeviceId(720955)), event: Added }
DeviceEvent { device_id: DeviceId(DeviceId(65599)), event: Added }
DeviceEvent { device_id: DeviceId(DeviceId(720957)), event: Added }
MainEventsCleared
[...]
NewEvents(WaitCancelled { start: Instant { t: 3125.514983921s }, requested_resume: None })
DeviceEvent { device_id: DeviceId(DeviceId(131227)), event: Motion { axis: 0, value: -1.0 } }
DeviceEvent { device_id: DeviceId(DeviceId(131227)), event: MouseMotion { delta: (-1.0, 0.0) } }
MainEventsCleared
[ Missing RedrawRequested, the call to request_redraw in MainEventsCleared is not honnored]
RedrawEventsCleared
NewEvents(WaitCancelled { start: Instant { t: 3125.514983921s }, requested_resume: None })
MainEventsCleared
RedrawRequested(WindowId(WindowId(0x703d8)))
RedrawEventsCleared

Issues:

  • NewEvents(Init) is dispatched twice
  • Missing RedrawRequested after MainEventsCleared
  • The last event loop (MainEventsCleared / RedrawRequested / RedrawEventsCleared) should not happen.

@filnet
Copy link
Contributor

filnet commented Mar 6, 2020

When using WaitUntil control flow only WaitCanceled gets emitted:

NewEvents(WaitCancelled { start: Instant { t: 5055.05687788s }, requested_resume: Some(Instant { t: 5060.765933731s }) })
MainEventsCleared
RedrawEventsCleared
NewEvents(WaitCancelled { start: Instant { t: 5055.05687788s }, requested_resume: Some(Instant { t: 5060.866509712s }) })
MainEventsCleared
RedrawEventsCleared
NewEvents(WaitCancelled { start: Instant { t: 5055.05687788s }, requested_resume: Some(Instant { t: 5060.967189259s }) })
MainEventsCleared
RedrawEventsCleared
NewEvents(WaitCancelled { start: Instant { t: 5055.05687788s }, requested_resume: Some(Instant { t: 5061.067769905s }) })
MainEventsCleared
RedrawEventsCleared
NewEvents(WaitCancelled { start: Instant { t: 5055.05687788s }, requested_resume: Some(Instant { t: 5061.168097083s }) })
MainEventsCleared
RedrawEventsCleared
NewEvents(WaitCancelled { start: Instant { t: 5055.05687788s }, requested_resume: Some(Instant { t: 5061.268679596s }) })
MainEventsCleared
RedrawEventsCleared

And with this fix only a single WaitCancelled gets emitted and the loop stops.

@filnet
Copy link
Contributor

filnet commented Mar 6, 2020

When using Poll control flow and calling request_redraw from MainEventsCleared:

NewEvents(Poll)
MainEventsCleared
RedrawRequested(WindowId(WindowId(0xb0780)))
RedrawEventsCleared
NewEvents(Poll)
MainEventsCleared
RedrawEventsCleared
NewEvents(Poll)
MainEventsCleared
RedrawRequested(WindowId(WindowId(0xb0780)))
RedrawEventsCleared
NewEvents(Poll)
MainEventsCleared
RedrawEventsCleared
NewEvents(Poll)
MainEventsCleared
RedrawRequested(WindowId(WindowId(0xb0780)))
RedrawEventsCleared

RedrawRequested is dispatched every second iteration.

@filnet
Copy link
Contributor

filnet commented Mar 6, 2020

@Osspial may be we should first merge my PR and then you could try your new approach on top of it. My PR addresses the same issues as yours but is 100% in line with the current code/approach. No revolution there.

If we do that then we get a more stable version of winit on Windows out now and you can then work on the new approach without pressure.

PS: my PR still needs to be reviewed though ;)

@Osspial
Copy link
Contributor Author

Osspial commented Mar 6, 2020

@Osspial may be we should first merge my PR and then you could try your new approach on top of it. My PR addresses the same issues as yours but is 100% in line with the current code/approach. No revolution there.

That sounds like a good approach. I'll try to get your PR, and a bunch of the other Windows PRs that have been waiting on the backburner, reviewed later today.

@Osspial
Copy link
Contributor Author

Osspial commented Mar 6, 2020

@filnet Would you be able to re-test this? I'm pretty sure I've fixed the bugs you brought up in your comments.

@filnet
Copy link
Contributor

filnet commented Mar 7, 2020

Retested the same use cases and it looks fine.
But looking at the PR I don't think the architecture is that simpler.
There are now threads involved and the EventLoopRunner has much more state.

@filnet
Copy link
Contributor

filnet commented Mar 7, 2020

More testing shows that there might still be an issue with WaitUntil control flow.
Moving the mouse over the window will stop the loop until resized.

I am seeing this in my project. I can't reproduce it with the control_flow example.
The symptoms are that no NewEvents/MainEventsCleared/RedrawEventsCleared are sent.
Only WindowEvent and DeviceEvent are sent.

@filnet
Copy link
Contributor

filnet commented Mar 7, 2020

Changing the wait time in control_flow from 100ms to (1000 / 60)ms reproduces the issue.

You need to switch to WaitUntil control flow (by pressing '2'). Then move the mouse out of the window and back in. You should see that you then only get WindowEvent or DeviceEvent.
Strangely when in that state you can't close the window anymore (either with X button or Alt+F4).

Resizing the window will either restore the event loop to a working state or emit a LoopDestroyed event (!?!).

Edit: you get the LoopDestroyed event if you try to close the window in the broken state and then resize.

@Osspial
Copy link
Contributor Author

Osspial commented Mar 7, 2020

But looking at the PR I don't think the architecture is that simpler.
There are now threads involved and the EventLoopRunner has much more state.

I'll admit that the secondary thread introduces a bit more complexity to the implementation, but there are a couple technical reasons that justify its inclusion:

  • Using a secondary thread for the wait timer allows Winit to play much more nicely with externally driven event loops. Because we're using a secondary thread instead of wait_until_time_or_msg, the thread message window's WM_PAINT handler can return immediately instead of blocking until a new event gets received. This reduces the chance for weird bugs if the user calls one of win32's many modal functions while running Winit's event loop, or wants to run Winit as a plugin event loop in an external application (which isn't possible now, but has been requested and should be attainable with this PR's work).
  • This PR flushes all of Winit's WM_PAINT messages in the first WM_PAINT message received by the event loop, even if that message was delivered to a user-facing window rather than the thread event target window. We can't use wait_until_time_or_msg there, since doing so would prevent WM_PAINT from returning until the next event gets received and would delay displaying the redraw to the user.

You need to switch to WaitUntil control flow (by pressing '2'). Then move the mouse out of the window and back in. You should see that you then only get WindowEvent or DeviceEvent.
Strangely when in that state you can't close the window anymore (either with X button or Alt+F4).

That's... certainly bizarre! It'll take a bit to figure out why exactly that happens, since this doesn't seem like a bug with a simple, obvious fix.

@Osspial
Copy link
Contributor Author

Osspial commented Mar 7, 2020

Alright, I fixed it! It was caused by a bug in PeekMessageW, and the bug was triggered by this call. PeekMessageW with PM_NOREMOVE (which is 0) set isn't supposed to remove messages from the event queue, but for whatever reason it was removing WM_PAINT from the queue when that paint message was triggered by a RedrawWindow call with the RDW_INTERNALPAINT flag set. Fixing it was just a matter of re-queuing the WM_PAINT message whenever an internal paint message was received.

@filnet
Copy link
Contributor

filnet commented Mar 8, 2020

Alright, I fixed it! It was caused by a bug in PeekMessageW, and the bug was triggered by this call. PeekMessageW with PM_NOREMOVE (which is 0) set isn't supposed to remove messages from the event queue, but for whatever reason it was removing WM_PAINT from the queue when that paint message was triggered by a RedrawWindow call with the RDW_INTERNALPAINT flag set. Fixing it was just a matter of re-queuing the WM_PAINT message whenever an internal paint message was received.

It is not a bug in PeekMessageW. The documentation mentions it:

The PeekMessageW function normally does not remove WM_PAINT messages from the queue. WM_PAINT messages remain in the queue until they are processed. However, if a WM_PAINT message has a NULL update region, PeekMessage does remove it from the queue.

@filnet
Copy link
Contributor

filnet commented Mar 10, 2020

I'll do a more in depth review this week (with an eye for pre 0.20.0 multithreaded left overs).

One thing I'd like to confirm is if we can get rid of the InvalidateRgn calls. Earlier tests in my branch showed that they are not necessary. The 'SetWindowPos` calls they follow already send the required messages.

@Osspial would be great if you could rebase your branch against master.

@filnet
Copy link
Contributor

filnet commented Mar 12, 2020

Found an issue: in WaitUntil control flow the event loop will not start until a first event happens.

@filnet
Copy link
Contributor

filnet commented Mar 15, 2020

About the WaitUntil issue: it is possible to reproduce it with the control_flow example.

We first need to merge this pull request : #1511

Then you need to change the example to start in WaitUntil mode and change the wait time to zero.

-const WAIT_TIME: time::Duration = time::Duration::from_millis(100);
+const WAIT_TIME: time::Duration = time::Duration::from_millis(0);
...
-    let mut mode = Mode::Wait;
+    let mut mode = Mode::WaitUntil;

With those changes, the event loop does not start. Seems like WaitUntil with an expired time is not well handled.

@Dooskington
Copy link

Any update on this? I was planning on using winit for the upcoming Ludum Dare jam but some Windows specific winit issues might prevent that. If there is any blocking work that needs to be looked at, let me know, I am willing to contribute :)

@0x08088405
Copy link

0x08088405 commented Apr 13, 2020

Seconding this, looks like the fix was implemented a month ago but never merged? Were there some unforeseen side effects of this fix? Can help out too if possible.

Edit: Tested the branch, confirming that it does fix the issue 👍

@Osspial
Copy link
Contributor Author

Osspial commented Apr 19, 2020

Alright, I've figured out how to fix the issue that made #1484 worse. I've tested the control_flow example and everything seems to be working fine. @filnet could you do a quick pass over this to double-check it's working properly? Assuming nothing comes up, this should be good to merge.

@filnet
Copy link
Contributor

filnet commented Apr 22, 2020

Alright, I've figured out how to fix the issue that made #1484 worse. I've tested the control_flow example and everything seems to be working fine. @filnet could you do a quick pass over this to double-check it's working properly? Assuming nothing comes up, this should be good to merge.

Will test that branch this weekend.

Copy link

@krooq krooq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add some doco to the run_return method.
This guy is the workhorse of this file it would just be great to have some high level stuff especially for the non windows dev folks like me, unless I go to the Windows API I cant tell what this is doing.
(Of course I did go to the windows API and its very useful)

Things like

  • GetMessageW blocks until a message is received and it returns -1,0,1
  • Why are we Translating/Dispatching messages?
  • What does runner.poll() do

The docs on flush_paint_messages are amazing.

I liked how explicit the control flow was before, now you have to dig a little deeper to find it.
It's fine, I just would like to know everything by reading this one method.

@filnet
Copy link
Contributor

filnet commented Apr 26, 2020

Quickly tested all control flow modes with and without calling request_redraw (using the control_flow example).
Looks good to me.

@Osspial Osspial merged commit b4c6cdf into rust-windowing:master May 4, 2020
@Osspial Osspial deleted the fix-everything branch May 4, 2020 19:14
VZout added a commit to EmbarkStudios/winit that referenced this pull request Jun 9, 2020
VZout added a commit to EmbarkStudios/winit that referenced this pull request Jun 18, 2020
VZout added a commit to EmbarkStudios/winit that referenced this pull request Jun 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
5 participants