-
-
Notifications
You must be signed in to change notification settings - Fork 21.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timestamped input buffering - prevent stalling and improve timing #77062
base: master
Are you sure you want to change the base?
Conversation
132d3db
to
a4982fb
Compare
Input::get_singleton()->set_use_input_buffering(true); // Needed because events will come directly from the UI thread | ||
Input::get_singleton()->set_has_input_thread(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Knowing why these are not exclusive would help me understand the new approach better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I can add a note about this.
It is rather confusing but is historical - accumulated input was added as an afterthought, and it overrides the use_input_buffering
setting. I did attempt to change this to something more sensible, but it would have been compat breaking. (This is why the unit tests were failing when I first made the PR, so I reverted to logic mirroring the legacy.)
If you look in the _update_buffering_mode()
logic you can see that buffering will still be used if use_input_buffering
is false (!) if use_accumulated_input
is true. This is because accumulated input requires buffering to work. This is how the legacy logic works.
If we are prepared to break compatibility, then we can probably make some simpler exclusive logic.
- Also note that agile input only currently activates when there is a separate input thread. So input buffering can effectively be on (in frame mode) without a separate input thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have added some comments in the _update_buffering_mode()
function to this effect.
Incidentally if we do decide to break compatibility in order to simplify, am wondering if this would be better to do as a followup PR, because changing the logic is more likely to lead to regressions.
// and smoothed time / realtime can get out of sync. | ||
mft.first_physics_tick_logical_time_usecs = current_cpu_ticks_usec; | ||
mft.first_physics_tick_logical_time_usecs -= (mft.physics_steps * mft.usec_per_tick) + leftover_usec; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to note that the timing calculations for the first logical physics tick are more complex when delta smoothing is being used (as there is time compression and expansion compared to wall clock time). I'm just fixing this up for the 3.x PR, but delta smoothing isn't merged yet in 4.x so will leave the simple version here for now.
Additionally "jitter fix" can significantly change input tick timings. For now I would advise setting jitter fix to 0 when agile input is in use, I will leave fixing this properly to a follow up PR. See #76839 (comment) for details.
incoming.resize(incoming.size() + 1); | ||
Event &e = incoming[incoming.size() - 1]; | ||
e.event = p_event; | ||
e.timestamp = p_timestamp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just call push_event
here to reuse code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicating here avoids a redundant extra lock, especially as this will not be a rare path.
Alternatively we could place the push
in a separate function from the locks, but that might be overkill for the few lines of code here (Godot codebase tends towards duplication rather more than I would personally, but it's no biggie).
Input was resulting in stalling and ANRs on Android and possibly other platforms. This PR separates buffered input into two parts to prevent stalls. It also timestamps input events and applies them on the relevant logical physics tick, rather than attempting to apply them "realtime".
On reflection I made a change to the scheme:
I've tested this on Android and agile input is still working, and it should be far simpler backward-compatibility wise. |
I tried cherry picking this to the 4.1 branch for public testing with a large audience, but I could not compile it anymore.
|
Can you point out where it's currently not threadsafe?
Can you show what exactly is the current 'stall' on Android? By stall, do you mean it's less efficient than it can be? Or do you mean it's deadlocking? |
Input was resulting in stalling and ANRs on Android and possibly other platforms. This PR separates buffered input into two parts to prevent stalls. It also timestamps input events and applies them on the relevant logical physics tick, rather than attempting to apply them "realtime".
Master version of #76839
(see that PR for more details and demo project)
This PR is quite a significant refactor to input buffering in order to make it threadsafe, store timestamps, and offer accurate agile input on physics ticks on platforms with separate input thread. It also offers a legacy mode (with a project setting) in case of regressions, this can be removed once any bugs have been worked out, which should make the code simpler.
N.B.
This PR was written immediately prior to #76399 being merged, which may have addressed some of the similar concerns for ANRs. This PR is thus more concerned with agile input than fixing ANRs.
Explanation
The fix in #76399 was to deal with the problem that processing input events (which could take an extended amount of time, for say a load or reading a web address etc) was locking the input buffer. It took a simple approach of micro-unlock / relock for each input event.
This PR is a bit more involved, it changes the buffer into a more thread safe structure.
When flushing the master buffer can thus be flushed up to a desired timestamp. This enables us to flush on physics ticks up to the logical timestamp of the physics tick (rather than realtime of when the tick is processed).
This means input can be spread out correctly over logical time - if a frame has e.g. 8 physics ticks, an on off keypress can be assigned to say ticks 2 and 3, instead of being only applied if the key happens to be pressed at the frame time. This is essentially a (hopefully! 😄 ) more accurate version of @RandomShaper 's agile input.
Accumulating
In AGILE mode, accumulating input (e.g. mouse motion) is deferred until flushing, which means that accumulation can be done accurately on a per tick basis. This can potentially give perfectly accurate input for say first person shooters, while still using accumulation to reduce the number of calls to user code and thus increase efficiency. In FRAME mode (and OS that do not support AGILE), accumulation still takes place correct to the nearest frame, as before.
Input thread
There is one snag, for the input timestamps to be meaningful, we ideally want input to come in free flow from the OS. This so far only occurs on Android, where there is a separate thread for input. On most of the other platforms, the OS message pump is done once per frame, so the end result is the input gets bunched up like the current status quo.
However, if we have the input backend working correctly with timestamps / agile input, that does leave the option open to now improve the input accuracy on other platforms in later PRs.
We can either investigate using a separate input thread (ideally) or perhaps resampling the input queue from the OS multiples times per frame (in single threaded fashion). The latter is not ideal but could lead to some greater input resolution.
Notes
legacy_event_flushing
project setting for more backward compatibility (but even so, input is quite different so will need testing).