Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: combo keys #320

Closed
herrsimon opened this issue Sep 4, 2022 · 12 comments
Closed

Feature request: combo keys #320

herrsimon opened this issue Sep 4, 2022 · 12 comments

Comments

@herrsimon
Copy link
Contributor

As a result of the discussion in #313, I hereby propose to add combo keys.

Functionality

Pressing multiple keys simultaneously activates a dedicated action instead of their individual bindings. Pressing simultaneously here means that the time between the first and last press has to lie below a predefined threshold, typically low enough to not interfere with normal typing.

Implementation proposal

Combos should be defined via

<key1>+<key2>+...+<keyn> = <action>

All credit for this incredibly expressive and simple syntax goes to @rvaiya.
THe number of keys should of course be at least two and at most some internal MAX_COMBO_SIZE variable (which should be ten, as in the number of fingers in my opinion). In addition, there should be a global combo_window variable (alternative names: combo_{timeout,term,threshold}), indicating the time in milliseconds within which all keys constituting a combo need to be pressed in order to trigger the combo.

A combo is triggered when the key down events of all constituting keys have occured (in arbitrary order) within a time window of at most combo_window ms, with no other key up or down events between the first and last press in that window.

While reading the manual pages of QMK and ZMK (see here for QMK and here as well as here for ZMK), the following details came up.

  • allowed combo length
    I vouch for not putting any artificial restriction on the allowed combo length, setting it to ten (the number of fingers) instead, even though most users will probably not use more than four keys per combo. As my tests results show (see below), pressing down all ten fingers within a sufficiently small time window during normal typing is possible.

  • release of imaginary combo key
    If a combo is triggered, behaviour should be as if a virtual combo key was pressed, so that binding combos to actions like timeout, overload, overload2, layer etc. work as expected. The release of that virtual key should happen when all of the constituting keys have been released (more precisely, when there has been at least one key up event for every constituting key). This way, when binding to layer, one could first trigger the combo and then release all but one of the constituting keys, allowing fingers on the same hand to press other keys (even some which constitute the combo) while still holding the layer.

  • simultaneously active combos
    There should be at most two combos allowed to be active (two imaginary combo keys pressed) at a time. Allowing more than two would only make sense for superhumans with either more than two hands or tentacle-like finger flexibility. Whether two are indeed needed is debatable. I vouch for this in order to cover rapid successive activation of combos triggered with different hands, for example

[main]
d+f = oneshot(alt)
k+l = oneshot(control)
  • disambiguation between combos
    For unambiguous combo matching, the following rules should be applied:

Whenever multiple overlapping time windows with width of at most combo_window ms could be laid over a given sequence of key presses, all resulting in valid combo triggers, the following rules should be applied to all such possible windows (in this order) to arrive at a unique choice:

  1. Discard all windows nested in some other window (longer combos trump shorter combos)
  2. Out of the remaining windows, choose the one starting earliest in time

For example, consider a combo_window of 25 ms and the configuration

[main]
d+f = ...
f+s = ...
a+s = ...
f+s+d = ...
f+s+a = ...

Then

  • <d_down> <2ms> <f_down> <10ms> <s_down> <50ms> would resolve to d+f+s (immediately after <s_down>, as there are no longer combos present in the config)

  • <d_down> <2ms> <f_down> <60ms> <a_down> <50ms> would resolve to d+f (at <a_down>), followed by a (appearing 25 ms after <a_down>, as a is part of another possible combo)

  • <d_down> <2ms> <f_down> <10ms> <a_down> <5ms> <s_down> <50ms> would resolve to d+f (immediately after <a_down>), followed by a+s (25 ms after <s_down>, to guard against a potential a+s+f combo. Even the above sequence also contains the longer combo f+a+s in within combo_window ms as well, it is not chosen as the combo d+f starts earlier in time. Then, the successfully resolved combo is removed from the sequence, leaving <a_down> <50ms> <s_down> <50ms>, which resolves to a+s (25 ms after <s_down> to guard against a potential a+s+f).

  • precedence over actions
    Naturally, scanning for press or release of a virtual combo key should occur before keys are handed over for further processing of any actions. In other words, all key up and key down events arriving from the physical keyboards first go through a “combo filter”, which internally replaces some of them by up and down events of virtual combo keys. The filtered event sequence is then processed normally. When documented properly, this results in completely intuitive and understandable behaviour for the user. For example, consider a combo_window of 100 ms and the configuration

[main]
d+f = layer(shift)
d = timeout(d, 150, layer(control))

Then <d_down> <50ms> <d_up> yields d (after 100 ms), <d_down> <60ms> <a_down yields da (without delay, as both the combo and timeout disambiguate immediately) and <d_down> <17ms> <f_down> <40ms> <a_down> yields S-a (and not df according to the timeout-binding, as combos are scanned for first).

  • introduction of optional required hold time?
    This question arised when reading this blog post, where a QMK-user sucessfully triggers combos if all keys are pressed within some time window and at least one of the keys is held longer than a predefined threshold. To quote from the post:

By setting both of these constraints, it becomes very difficult to accidentally trigger the combo in normal typing.

My testing below suggests that this could indeed be useful when using cross-hand combos. Note that one can not solve this by combining with overload2, as in d+f = overload2(shift, macro(df), 200), as this would not work when f is pressed before d within combo_window but then not held for 200 ms. The problem here is that the order in which keys have been pressed would need to be preserved. A global option would also not be optimal, as for some combos requiring an additional hold time (incurring additional delay) is not necessary. So if this gets implemented, one would have to allow to optionally specify the hold time on a per combo level (for example in the form f+j 200ms = layer(shift), which somewhat destroys the simplicity and elegance of a plain f+j = layer(shift)). I haven't taken sides on this issue yet.

Initial testing

I systematically tested triggering different types of combos during normal text typing, to simulate realistic settings. Combos consisting of up to ten fingers were used, they either activated some modifier or resulted in input of some foreign character or symbol (German umlauts, brackets etc.)

The most important findings are:

  • intra-keypress time with hand cross can be as low as 5 ms during normal text typing (and this is not just an exotic case). In contrast to this, without hand cross the times never fall below 25 ms unless making a conscious effort to press keys in short succession, resulting in unnatural typing behaviour for the rest of the word.
  • as long as one stays on the homerow and with a natural thumb key (space or a central key on a thumb cluster), combos involving all ten fingers can consistently be pressed in about 10 ms on average, with a maximum of 21 ms. The average and maximum significantly improve to about 7 ms and 16 ms when restricting to one hand combos

This shows that combos pressed with just one hand should be usable without any restriction, while for combos made up of keys under both hands there will be an unavoidable ambiguity. However, this could be addressed by requiring a minimal hold time (see below). Additional findings:

  • leaving the homerow makes things significantly worse (but still keeps maximum times below 25 ms most of the time). One exception are two-finger combinations involving the index finger.
  • just using one hand or leaving out the thumbs does not result in noticeable improvements (contrary to my initial expectations)
  • leaving out the pinkies results in a noticeable improvement
  • the statistical variance significantly decreases when restricting to one hand and then again when restricting to two fingers, one of which is the index finger. However, as the maximum time stays well below 25 ms in all these cases, variance is not important here.

Disadvantages

Keys appearing in a combo declaration have an added visual latency.

Relation to overload2

One could try to replicate the proposed combo key functionality by using timeout-based actions and dedicated “combo layers” (see #313). However, such solutions are not optimal, as they for example might involve a minimal hold time for triggering combos, are tedious and involved to setup.
Instead, I think that triggering combos purely based on the time between key presses is both easier to understand and also more powerful. In fact, some people might find it advantageous to use combos and overload2 in combination, for example by d+f = overload2(shift, %, 200).

Use cases and comparison to existing dual-role functionality

Combo keys offers a way to attach a second role to a physical key, and as such a priori competes with the timeout, overload and the proposed overload2 (see #309) actions. However, all those actions involve the hold time in some sense, while combo keys are exclusively (or primarily) press time focused, which in my opinion is a fundamental difference.

In the particular comparison with overload2, one has the biomechanical effort of having to press multiple keys almost simultaneously on the combo key side, which competes with the motoric ability of being able to consistently holding a key for different amount of times, depending on the intended role. Objectively, I can't identify a clear winner here. Furthermore, it even makes sense in my opinion to use both of these functionalities at the same time by assigning a timeout or overload action to a combo key.

Here are a few use cases:

  • homerow modifiers via combos
    For example via
[main]
f+d = layer(shift)
f+s = layer(alt)
f+a = layer(control)
s+d = layer(meta)

and analogous bindings for the right hand. Instead of modifiers, one could of course also put generic layers (for entering symbols, accented letters, mathematics, macros etc.) on the homerow or somewhere else. Also, at the cost of more delays, one could add additional layers to the homerow, for example by adding

f+d+s = oneshot(symbols)
f+d+a = oneshot(numbers)
space+f+d = oneshot(greek)

to the above config. Composite modifiers would make sense as well, adding

f+d+s = oneshot(shift_alt)
f+s+a = oneshot(control_alt)

[control_alt:C-A]
[shift_alt:S-A]

instead.

  • plain extra keys
    For example a+f = ä.

  • combination with regular overloads or timeouts
    Bindings like a+f = overload2(layer(control), ä, 200) or similar involving timeout or overload.

@herrsimon
Copy link
Contributor Author

I just read #111. The current best timeout-based solution proposed there could be simplified a lot by introducing combo keys, so I add this as an additional use case. It would simply boil down to the most natural solution, intuitively proposed by the OP when opening the thread, namely control+shift+alt+meta = layer(meta).

@rvaiya
Copy link
Owner

rvaiya commented Sep 5, 2022

THe number of keys should of course be at least two and at most some internal MAX_COMBO_SIZE
[...]

I think we are again conflating design with implementation and making things needlessly complex. Most of what you describe is a natural byproduct of the latter.


The proposal:

Allow for a+b[+c..] = <action>

Where a, b, and c are keys which are simultaneously depressed (struck within a certain time interval).


The high level question is simple:

Should native support for key chording be added?

Arguments in favour:

  • It would reduce visual latency by adding a dedicated disambiguation mechanism (since the chorded interkey interval is much lower than the normal one).
  • It would allow arbitrary actions to be mapped to key combinations.
  • It would provide a cleaner syntax.

Arguments against:

  • This is a fairly niche feature which adds a lot of internal state. (technically an implementation concern)
  • 90% of the benefits can be realized using overload2 (as suggested in How to filter key interrupts in timeouts? #313 (comment)).
  • It adds a hidden timeout to core functionality (since the problem of chording cannot be solved using sequences).

Presently I am opposed to it, but I will allow others to weigh in.

@rvaiya
Copy link
Owner

rvaiya commented Sep 5, 2022

@slakkenhuis
@alefbragin

^

@herrsimon
Copy link
Contributor Author

I think we are again conflating design with implementation and making things needlessly complex. Most of what you describe is a natural byproduct of the latter.

Sorry, I apparently got overexcited again.

Some additions to your summary:

Arguments in favour:

  • It would provide a cleaner syntax.

To put this into perspective: The layers and needed lines when using overload2 or timeout grow exponentially (with a linear factor) in the combo length, so that at least 5, 13, 33 and 81 lines are needed to configure a single combo of size 2, 3, 4 and 5, respectively. Things get worse when multiple nested or overlapping combos are desired. Compare this with just a single line needed with the current proposal.

Besides, the syntax is not only clean, but exceptionally simple, elegant and expressive. To really appreciate it, compare with what is needed in QMK, ZMK or TMK. I would bet money on the claim that new users won't be able to fully grasp the meaning without resorting to the documentation, in stark contrast to what is proposed here.

Arguments against:

  • This is a fairly niche feature which adds a lot of internal state.

While I fully agree that the implementation is rather involved, I don't think that combos are a niche feature (or do you have some data to back this claim up?). At least it seems to me that it is not more of a niche than most other actions that keyd offers. Whenenever I encounter feature listings or comparisons of keymapping software/firmware, combos are always prominently on the list (see for example the comparison tables here or here). I conclude from this that combo functionality is an integral part of every remapping solution and hence keyd should offer decent support for it in some way (not necessarily natively). If the claim

was true, everything would be fine, but I strongly disagree with the 90%. Actually I would go as far as saying that the suggested technique does not offer a valid (partial) alternative at all, because the statement

a = overload2(a, a, 50)

where the first argument is the layer a and the second the letter key a, makes it almost impossible to type the letter a (by tapping in under 50 ms). For me, the optimal combo interval is somewhere in the region of 25 ms, making things even worse. To achieve combo-like behaviour with currently available means, I see no other way than using timeout as described in the same thread, but this introduces significant other problems, which don't seem to be fixable using currently available functionality.
Anyway, because of the cumbersome and exploding syntax, I doubt that anybody would actually configure longer combos or deal with multiple nesting or overlapping ones. We could now of course say that longer combos are not useful or usable and hence keyd should steer users away from them, but then all stenotypists would prove us wrong.

@nsbgn
Copy link

nsbgn commented Sep 7, 2022

I don't think that it is a niche feature at all --- it's just a variant of overloading letter keys. That's a subject that @rvaiya and I might want to put into a niche... but we've already seen numerous issues of users expecting to be able to do that. (In fact, we experimented with it as well!)

Given that people want to overload letter keys, it seems a good idea to push them in the right direction. And if a right direction exists, my intuition is that combos are it. They are a relatively benign special case of letter key overloading: visual latency is limited (by both the tiny interval and the fact that interrupts by non-combo keys can immediately resolve the ambiguity). Misfires are mitigated by the limited context in which ambiguities can occur, and are further minimized when adding a required hold time. Finally, users won't have to worry about picking the right timeout mechanics and layer configuration that we've been wrestling with lately.

The hidden timeout is a bit magical, but it makes for much cleaner configs, and wouldn't require more manual-reading than a repeated explicit timeout would. (But the inconsistency with timeouts being explicit other areas would bother me a little.)

My one worry is that the proposed syntax is so nice and natural that it might misguide users into thinking that what they're doing is trivial and/or the most obvious solution. I also foresee more issues caused by rollover, and we'd have to caution that control+f is not the same as C-f. On the other hand, there would now be a natural point in the manual to point out such drawbacks.

On the whole, I'm unexpectedly finding myself in favour of this feature --- much less on the fence than I was about overload2. As @herrsimon mentioned, where overload2 overlaps with this use case, it quickly becomes unwieldy. If I had to choose between the two1, I'd drop the latter in a heartbeat.

Disclaimer: This endorsement is on purely theoretical grounds as of time of writing :P

Footnotes

  1. Don't worry, I realize they serve different purposes, despite the overlap!

@rvaiya
Copy link
Owner

rvaiya commented Sep 9, 2022

The layers and needed lines when using overload2 or timeout grow exponentially (with a linear factor) in the combo length,

In practice I would expect combo length to almost always be 2 and the total number of them to be relatively small.

I don't think that combos are a niche feature (or do you have some data to back this claim up?).

It is merely a hunch :P. It strikes me as mainly being useful for anorexic keyboards like the gergo where real estate is precious.

My main opposition was really the third point, namely that it introduces hidden timeouts (keyd having hitherto strongly favoured sequence based logic over time based logic), however now that we have overload2 I suppose that pandora's box has already been opened.

Whenenever I encounter feature listings or comparisons of keymapping software/firmware, combos are always prominently on the list (see for example the comparison tables here or here). I conclude from this that combo functionality is an integral part of every remapping solution

Argumentum ad populum. I tend to take a more conservative approach (just because someone submit a QMK PR doesn't mean the feature is necessarily usable), however I can see the utility of this particular one.

but I strongly disagree with the 90%. Actually I would go as far as saying that the suggested technique does not offer a valid (partial) alternative at all, because the statement

a = overload2(a, a, 50)

where the first argument is the layer a and the second the letter key a, makes it almost impossible to type the letter a (by tapping in under 50 ms).

Agreed, I probably should have provided a more realistic example. Something like:

a = overload2(a, a, 200)
s = overload2(s, s, 200)

[a]
s = layer(control)

[s]
a = layer(control)

The timeouts have the luxury of being longer than the chording interval, since by the time the third key is struck, it is likely that one of the two timeouts will have triggered. The disadvantages, as you've observed, are the additional visual latency and the fact that it doesn't work well for arbitrary actions (since non-layer actions don't involve a third key).

Anyway, because of the cumbersome and exploding syntax, I doubt that anybody would actually configure longer combos or deal with multiple nesting or overlapping ones.

One would hope that the syntax is not the only reason ;).

We could now of course say that longer combos are not useful or usable and hence keyd should steer users away from them, but then all stenotypists would prove us wrong.

I don't think this analogy is applicable. When you are dealing exclusively with chords, life becomes a lot easier since you don't have to try and disambiguate intent.

That's a subject that @rvaiya and I might want to put into a niche... but we've already seen numerous issues of users expecting to be able to do that.

Fair point :P

Given that people want to overload letter keys, it seems a good idea to push them in the right direction. And if a right direction exists, my intuition is that combos are it. They are a relatively benign special case of letter key overloading: visual latency is limited (by both the tiny interval and the fact that interrupts by non-combo keys can immediately resolve the ambiguity). Misfires are mitigated by the limited context in which ambiguities can occur, and are further minimized when adding a required hold time. Finally, users won't have to worry about picking the right timeout mechanics and layer configuration that we've been wrestling with lately.

This sold me on it.

My one worry is that the proposed syntax is so nice and natural that it might misguide users into thinking that what they're doing is trivial and/or the most obvious solution. I also foresee more issues caused by rollover, and we'd have to caution that control+f is not the same as C-f.

+1. This is the other thing that has been bothering me. I've also considered other possible interpretations of a+b = c, but chording probably makes the most sense given the current design.

Disclaimer: This endorsement is on purely theoretical grounds as of time of writing :P

I am theoretically on board ;).

@herrsimon
Copy link
Contributor Author

@slakkenhuis

I don't think that it is a niche feature at all --- it's just a variant of overloading letter keys. That's a subject that @rvaiya and I might want to put into a niche... but we've already seen numerous issues of users expecting to be able to do that. (In fact, we experimented with it as well!)

As keyd itself already caters to a niche market, we are discussing about niche niches here anyway.

Given that people want to overload letter keys, it seems a good idea to push them in the right direction. And if a right direction exists, my intuition is that combos are it. They are a relatively benign special case of letter key overloading: visual latency is limited (by both the tiny interval and the fact that interrupts by non-combo keys can immediately resolve the ambiguity). Misfires are mitigated by the limited context in which ambiguities can occur, and are further minimized when adding a required hold time. Finally, users won't have to worry about picking the right timeout mechanics and layer configuration that we've been wrestling with lately.

I completely agree, a combo with a required hold time should be usable for most users, irrespective of their typing speed.

(But the inconsistency with timeouts being explicit other areas would bother me a little.)

As long as the end user doesn't notice it, this is bearable in my opinion.

My one worry is that the proposed syntax is so nice and natural that it might misguide users into thinking that what they're doing is trivial and/or the most obvious solution.

While hiding complexity is not a bad thing, I agree that one would have to guard somehow against overuse this feature.

I also foresee more issues caused by rollover, and we'd have to caution that control+f is not the same as C-f. On the other hand, there would now be a natural point in the manual to point out such drawbacks.

I agree, and as my tests suggest, cross-hand bigrams are frequently entered below the combo threshold. This problem can never be avoided but should only occur when defining two key cross-hand combos. The best way out would be a big caveat in the manual regarding this case.

On the whole, I'm unexpectedly finding myself in favour of this feature --- much less on the fence than I was about overload2. As @herrsimon mentioned, where overload2 overlaps with this use case, it quickly becomes unwieldy. If I had to choose between the two1, I'd drop the latter in a heartbeat.

I'm verry happy to read that!

@rvaiya

My one worry is that the proposed syntax is so nice and natural that it might misguide users into thinking that what they're doing is trivial and/or the most obvious solution.

This should be true for most people, but there will always be exceptions. Besides, the combos s+d+f or f+thumbkey_left+j+thumbkey_right are in my opinion easier to trigger than a+s or a+f, as they leave out the pinky and/or use strong fingers. Because of such examples, combo size should not be artificially limited.

I don't think that combos are a niche feature (or do you have some data to back this claim up?).

It is merely a hunch :P. It strikes me as mainly being useful for anorexic keyboards like the gergo where real estate is precious.

People using such keyboards argue that minimizing finger travel results in a much more pleasurable typing experience. As you might have guessed, I belong to this camp and sucessfully got rid of the top (number) row and the two keys above enter, with the end goal being a corne-like setup with a trackpoint somewhere between a custom placed thumb cluster.

Argumentum ad populum.

You are right, I apologize for this.

One would hope that the syntax is not the only reason ;).

Completely agree.

We could now of course say that longer combos are not useful or usable and hence keyd should steer users away from them, but then all stenotypists would prove us wrong.

I don't think this analogy is applicable. When you are dealing exclusively with chords, life becomes a lot easier since you don't have to try and disambiguate intent.

On a steno keyboard, isolated presses have meaning as well and occur frequently, for example as prefix strokes (turning “round” into “around” etc.), so this analogy applies perfectly in my opinion. In fact, disambiguation should essentially be the same as in a steno software (which has the slight advantage of being able to take later combos into account, as meaning is typically inferred from combo sequences). In any case, stenotyping proves that using combos of arbitrary lengths is not obscure.

Given that people want to overload letter keys, it seems a good idea to push them in the right direction. And if a right direction exists, my intuition is that combos are it. They are a relatively benign special case of letter key overloading: visual latency is limited (by both the tiny interval and the fact that interrupts by non-combo keys can immediately resolve the ambiguity). Misfires are mitigated by the limited context in which ambiguities can occur, and are further minimized when adding a required hold time. Finally, users won't have to worry about picking the right timeout mechanics and layer configuration that we've been wrestling with lately.

This sold me on it.

Yes!

I am theoretically on board ;).

Double yes!

@rvaiya
Copy link
Owner

rvaiya commented Oct 1, 2022

This should be implemented in the latest commit.

@rvaiya rvaiya closed this as completed Oct 1, 2022
@xpusostomos
Copy link

@rvaiya when you say key combos are "implemented in the latest commit", does it support emacs style combos? i.e. C-x C-f will do something, and be treated as a combination? If so, I'd like to see a sample config to map some emacs keys in CUA style apps.

@nsbgn
Copy link

nsbgn commented Sep 16, 2023

@rvaiya when you say key combos are "implemented in the latest commit", does it support emacs style combos? i.e. C-x C-f will do something, and be treated as a combination?

Combos here mean keys that are literally pressed at the same time. If you mean that, then yes, it does:

[control]
x+f = <something>

However, if I understand correctly, by emacs style you probably mean a more prefix-key situation? You could combine oneshot keys and composite layers to achieve that:

[control]
x = oneshot(controlX)

[controlX]

[control+controlX]
f = <something>

Is that satisfactory? If not, it probably merits a separate issue :)

@atanasj
Copy link

atanasj commented Oct 29, 2023

Is there some timing setting that I need to consider when writing overlapping combos? E.g.:

j+k = tab
k+l = enter
j+l = backspace
j+k+l = oneshot(M-A-S-C) # not sure if this is correct, but what I mean is a 'hyper'-like key 

When I include the j+k+l it interferes with / breaks the other combos.

@nsbgn
Copy link

nsbgn commented Oct 30, 2023

There is a timeout, namely:

chord_timeout: The maximum time between successive keys interpreted as part of a chord. (default: 50)

However, oneshot(M-A-S-C) isn't correct, because M-A-S-C isn't a layer. Add the following custom empty layer and use oneshot(customlayer)

[customlayer:M-A-S-C]

If you experience more interference, it probably merits a separate issue with example keyd monitor output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants