Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terminal/parser: determine what to do when split escape sequence writes show up #4037

Open
fabian-z opened this issue Dec 21, 2019 · 19 comments
Labels
Area-Input Related to input processing (key presses, mouse, etc.) Issue-Task It's a feature request, but it doesn't really need a major design. Priority-3 A description (P3) Product-Conpty For console issues specifically related to conpty
Milestone

Comments

@fabian-z
Copy link

fabian-z commented Dec 21, 2019

Environment

Windows build number: Version 10.0.18363.535
(reproducible with OpenConsole from master)

Windows Terminal version (if applicable): n/a

Any other software?
- Simple ConPTY test code

Steps to reproduce

  • Create pseudo console as seen in examples
  • Copy I/O to and from serial / COM port; alternatively manually write CSI or escape sequence in multiple write operations.

Example for arrow-up key (as seen during development with serial ports):
Write 1: 0x1b
Write 2: 0x5b 0x41

Expected behavior

  • Working escape sequences over multiple write operations to PTY input pipe

Example for arrow-up key:

$ showkey -a

Press any keys - Ctrl-D will terminate this program

^[[A 	 27 0033 0x1b
 	 91 0133 0x5b
 	 65 0101 0x41

Actual behavior

Due to FlushAtEndOfString being true, the escape sequence is split up and rendered wrong. E.g. arrow-up becomes:

^[       27 0033 0x1b
[A       91 0133 0x5b
         65 0101 0x41

which only renders "[A" to the PTY without the desired effect.

This is 100% reproducible, i.e. the arrow keys never work because they are always received as two reads from the (virtual) serial port.

Note: A custom build of conhost.exe (OpenConsole.exe) with FlushAtEndOfString changed to return false does correctly process split escape sequence writes.

Comments

The comment in the caller in stateMachine.cpp says the following:

        // <kbd>alt+[</kbd>, <kbd>A</kbd> would be processed like `\x1b[A`,
        // which is _wrong_).
        //
        // Fortunately, for VT input, each keystroke comes in as an individual
        // write operation.

I think both statements here are wrong, or at least not universally true. Typing Alt+[ and A on a VTE based Linux terminal emulator (e.g. gnome-terminal or xfce4-terminal) will result in the same reaction as pressing the arrow-up key.

Is there any standard that guarantees VT input sequences to happen in single writes? I tested Linux support of split escape sequence writes - it worked fine on Linux 5.4 framebuffer getty TTY and on all tested PTY terminal emulators (xterm, VTE, etc.).

Related: #2823

@ghost ghost added Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Needs-Tag-Fix Doesn't match tag requirements labels Dec 21, 2019
@egmontkob
Copy link

Is there any standard that guarantees VT input sequences to happen in single writes?

I don't think there's ever such a guarantee.

Traffic might go over channels which don't even have this concept (e.g. the serial line). I don't know if the standard OS methods for reading from a serial line wait for more data before returning if data is currently flowing in, or return the already available chunk immediately. This might depend on some setting (e.g. O_NONBLOCK), and obviously on the OS.

Traffic might go over layers and channels (e.g. TCP, ssh, tmux...) that are allowed to split or join fragments.

That being said, if you press a key which generates a 3 byte sequence, and your terminal emulator writes that in a single step, you can be reasonably certain that it will arrive at the reading app in a single step.

I recommend the terminal to write them in a single step. Some applications rely on heuristics (i.e. timing) to figure out what happened, e.g. to distingusing between Alt+[ followed by A, vs. the Up arrow. The most notable example is probably vim's handling of the Esc keypress. It's good to help them.

// <kbd>alt+[</kbd>, <kbd>A</kbd> would be processed like `\x1b[A`,
// which is _wrong_).

I can't see how this is relevant there, though. Doesn't this comment mix up the two directions? It's in src/terminal/parser/stateMachine.cpp, StateMachine::ProcessString(), suggesting that it's about the terminal parsing its incoming data. When you press a key, the terminal generates the outgoing sequence. The terminal doesn't need to parse keypresses. (Or does WT need to, due to its extremely complex architecture with ConPTY and whatnot?)

@fabian-z
Copy link
Author

Good point on the possibility of using timing heuristics.

The terminal doesn't need to parse keypresses.

Actually, the new ConHost does need to parse keypresses from VT sequences for backwards compatibility with the old Console API. See the architecture chart from the introducing blog post.

@j4james
Copy link
Collaborator

j4james commented Dec 24, 2019

If, as I understand it, this is about forwarding keypresses over the conpty connection, then I don't think it's worth trying to fix, because it's never going to satisfy everyone, and there are many other ways in which it can fail.

In the long term I think the solution is to use a more robust protocol for keypresses that doesn't have these problems (see #879 for discussions along these lines). My personal preference would be the DECPCTERM mode.

@nicm
Copy link

nicm commented Dec 24, 2019

That being said, if you press a key which generates a 3 byte sequence, and your terminal emulator writes that in a single step, you can be reasonably certain that it will arrive at the reading app in a single step.

On localhost and fast networks you mostly get away with it but I think that is partly because the consequences of it going wrong are pretty mild - the user just presses the key again.

There is no atomicity guarantee here, applications need to handle partial reads to be correct.

@fabian-z
Copy link
Author

fabian-z commented Jan 2, 2020

If, as I understand it, this is about forwarding keypresses over the conpty connection, then I don't think it's worth trying to fix

I have used arrow keys over VT terminals (mostly with the mentioned terminal emulators) for years with all sorts of hardware and software, including embedded devices, without a problem. It is the de facto standard and only partially supporting it would force many applications to find workarounds.

the consequences of it going wrong are pretty mild - the user just presses the key again.

For clarification: In my case, the issue described above is 100% reproducible, i.e. the arrow keys never work because they are always received as two reads from the (virtual) serial port. I added this to my issue description

@j4james
Copy link
Collaborator

j4james commented Jan 2, 2020

I have used arrow keys over VT terminals (mostly with the mentioned terminal emulators) for years with all sorts of hardware and software, including embedded devices, without a problem. It is the de facto standard and only partially supporting it would force many applications to find workarounds.

The backend in this case is not a VT application though. It's potentially a legacy Windows app that knows nothing about VT escape sequences, and might expect to be able to differentiate between simple keystrokes like Ctrl+2 and Ctrl+Space, which VT terminals treat as identical escape sequences. And that's not even considering more complicated scenarios, like say a piano playing app that needs to know exactly when each key is pressed and released, and be able to handle multiple keys held down at the same time.

If Windows Terminal is going to be a viable replacement for the existing Windows console, it has to able to run existing Windows console apps, and that means the conpty layer needs something more powerful than basic VT sequences for forwarding keystrokes. Any actual VT applications on the server side wouldn't need to care, though. They'll still just receive a simple escape sequence if that's all they want.

@fabian-z
Copy link
Author

fabian-z commented Jan 2, 2020

Testing on Windows Server 2019 (Version 10.0.17763.737) SAC / EMS with the same terminal emulator and serial port driver, the console channel type is VT-UTF8 with working support of the arrow key VT sequences.

SAC transcript

@j4james
Although we might disagree regarding realistic command line use cases, I think we agree that Windows Terminal could (given proper ConHost and API support) use another standard like PCTerm better suited to transmitting keypresses and modeling the legacy Console API interactions. This issue however, is about correct support for (split) VT sequences in ConPTY and does not involve Windows Terminal as an application.

@j4james
Copy link
Collaborator

j4james commented Jan 2, 2020

This issue however, is about correct support for (split) VT sequences in ConPTY and does not involve Windows Terminal as an application.

I don't think you can really claim there is one "correct" way to handle split VT sequences though. Every choice involves some kind of compromise. Typically it would be up to receiving application to decide how to handle things based on their particular needs. And if conpty used a more robust protocol for passing on the keystrokes then it wouldn't need to worry about any of this - the control would be back in the hands of the receiving application.

@DHowett-MSFT DHowett-MSFT changed the title terminal/parser: split escape sequence writes not working terminal/parser: determine what to do when split escape sequence writes show up Jan 6, 2020
@DHowett-MSFT DHowett-MSFT added Area-Input Related to input processing (key presses, mouse, etc.) Issue-Task It's a feature request, but it doesn't really need a major design. Priority-3 A description (P3) Product-Conpty For console issues specifically related to conpty labels Jan 6, 2020
@ghost ghost removed the Needs-Tag-Fix Doesn't match tag requirements label Jan 6, 2020
@DHowett-MSFT DHowett-MSFT added this to the Console Backlog milestone Jan 6, 2020
@DHowett-MSFT DHowett-MSFT removed the Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting label Jan 6, 2020
@fabian-z
Copy link
Author

fabian-z commented Jan 16, 2020

@j4james
Excuse me if my statement was not unambiguous - English is not my first language.
I was using the following definitions of "correct":

Merriam-Webster, Oxford Dictionary

conforming to an approved or conventional standard [...] behavior

The "conventional standard" here being the behavior shown (at least) by Linux VTE & other terminals, most embedded devices with serial connections and the Windows Server SAC/EMS itself.

@fabian-z
Copy link
Author

fabian-z commented Jan 25, 2020

Searching for relevant standards covering this topic, I found an old manual stating that

The VT100 was Digital's first terminal based on the ANSI standard for control sequence encoding (ANSI X3.64).

Although there seem to be extensions and specification of edge cases making it desirable to implement DEC compatible terminal emulation (see here), the standard that introduced (among others) CSI escape sequences is:

ANSI X3.64-1979 (Additional Controls for Use with American National Standard Code for Information Interchange) - Google Books

I think the (informational purpose) Appendix section supports the viewpoint that the structure of the input stream should not affect the operation and that subsequent characters after control sequence introductions should be processed as parameters as long as they don't result in an invalid sequence.

Excerpt from Appendix B (in our case the "device" would be ConPTY):

Appendix B - Device Concepts

[...]

This Appendix is written from the point of view of the device.
Therefore the term received data stream refers to the data stream received by the device and the term transmitted data stream refers to the data stream transmitted from the device.

B2. The Received Data Stream

The received data stream is considered to be a continuous stream received by the device.
The received data stream may be structured in messages, records and/or blocks, but this does not affect the operation of the device at the abstract level of description; the logical or physical units of data are regarded as being concatenated to form a continuous stream.

[...]

B5. Processing the Received Data Stream

B5.1 General

A device processes a received data stream one character after the other in the logical steps given in B5.2 through B5.4

B5.2 Control Characters

If the control character specifies a control function that can be executed by the device the corresponding operation is performed according to the definition.

[...]

When a control sequence is processed which requires one or more parameters to complete the specification of the control function, the characters representing the parameters are collected from the received data stream and processed as a part of the control operation.
These characters are not separately processed by step 2 or 3 below.

The current standard should be ISO/IEC 6429:1992, but it is not available for free and I do not have a copy to check.

@fabian-z
Copy link
Author

fabian-z commented Mar 4, 2021

Any chance to move this issue from backlog or increase priority?
It has been well over a year with no progress and ConPTY control sequence handling is still not standard conformant.

After further research, the current open standard seems to be ECMA-48:

The first edition of this Standard ECMA-48 was published in 1976. Further editions followed. The fourth edition, published in 1986 was adopted by ISO/IEC under the fast-track procedure as second edition of ISO 6429.

Relevant quote:

The data stream is considered to be a continuous stream. It may be structured in messages, records and/or blocks, but this does not affect the operation of the devices at the abstract level of description in this Standard; the logical or physical units of data are regarded as being concatenated to form one continuous data stream

Source: https://www.ecma-international.org/publications-and-standards/standards/ecma-48/ (Section 6.2 - The data stream)

ConPTY does treat input buffers as messages, breaking control sequences apart. This is invalid behaviour according to the standard.

@DHowett
Copy link
Member

DHowett commented Mar 4, 2021

Thanks for following up. In the year since this issue was filed, it's gained very little support. I know that it's a standards compliance issue, and I value standards compliance, but I can't justify prioritizing it over some of the other stuff we've got going on. It's a complicated problem space where everyone has a different solution (a timeout if you receive ESC at the end of a packet, sending both ESC and the next character in case of a fault, etc.).

Unfortunately, while the letter of the standard specifies that the data stream is continuous it does not appear to specify the behavior of a conforming application in the face of ambiguous stream contents.

@nicm
Copy link

nicm commented Mar 4, 2021

In practice the standard can specify all it likes, there is no way to ensure escape sequences are not broken up somewhere between terminal and application. Not to mention if the user for example presses Escape then q instead of M-q. Most applications that I am aware of solve this by using a timeout if the key is ambiguous.

It is a little unfortunate that you do this so aggressively because it will break naive applications like showkey and any application where the user disables the timeout, but it is something that applications have to handle.

@fabian-z
Copy link
Author

fabian-z commented Mar 6, 2021

@DHowett Thanks for your reply and clarification.
I understand this is a complex problem space and priorities are difficult to manage. However, in my opinion this is not only a theoretical standards compliance issue but also a bug that actually prevents usage of ConPTY for applications that cannot guarantee that escape sequences are terminated within one input buffer, like network or serial port applications.
If not changed, this would require reimplementation of another layer of escape sequence handling or bundling of a patched OpenConsole.

It also seems to be possible to work around this with solutions approved by Microsoft engineering, since there is a working implementation for Windows Server SAC, which I tested in the same environment (see #4037 (comment)).

The standard defines that action in error recovery is left to the implementation. However, if a control sequence is started in one input and not finished (but also not invalid), I think the correct course of action is to wait for the next input since otherwise ConPTY does not work on an input stream but on individual input buffers (violating the standard).
A timeout could also be used as @nicm suggested (treat the input sequence as partial or error according to the spec after some amount of time without new input).

The reasoning for the current implementation seems flawed to me (see code comments or #2823 (comment)), since it says e.g.

alt+[, A would be processed like \x1b[A, which is wrong

But testing with different Linux terminal emulators, it is exactly the observed behaviour:
echo 1b5b41 | xxd -r -p - and echo 1b | xxd -r -p - && echo 5b41 | xxd -r -p - both cause the same reaction without printed output.

Fortunately, for VT input, each keystroke comes in as an individual write operation.

This cannot be guaranteed for many applications and the assumption effectively violates the standard (see above).

@determin1st

This comment was marked as off-topic.

@determin1st
Copy link

determin1st commented Mar 28, 2024

8-bit response mode solves that problem, any 8-bit control is a native continuator, so the parser could delay the parsing, 7-bit control wont go anywhere 's futile, because 1B has a meaning by itself. also the protocol, the world needs a nice key protocol

@zadjii-msft
Copy link
Member

@determin1st You're probably more interested in #4999, where we literally made a protocol for transmitting Win32 console input sequences (nearly 4 years ago now)

@determin1st
Copy link

determin1st commented Mar 28, 2024

@zadjii-msft oke, ill look through.. for now can say that INPUT_RECORD is probably overkill, KEY_RECORD yes, mouse tracking protocol exists already, would be more compatibility..

@determin1st
Copy link

determin1st commented Apr 5, 2024

ooke, as im progressing in construction of my own parser, i see that 8-bit responses dont fully solve the partial input problem (some edge cases). there must be a partial input terminator.

Putty supports ENQ request-response, but it is not wildly supported. i think another basic request - the DA1 is appropriate, because its result/answer is already known to the terminal user (any library does it upon initiation)

so after the parser returns the partial input state, stores the remaining bytes for the next attempt, it also should do the DA1 request to have a clarity whether no more bytes would follow the partial. this nicely closes all edge cases.

waiting for known input is better than waiting for unknown input.

also i was browsing through some of docs here and found this https://github.com/microsoft/terminal/blob/main/doc/specs/%23976%20-%20VT52%20escape%20sequences.md 's like with INPUT_RECORD miswording, ANSI/VT52 is a single mode it's not leaving VT52 to ANSI or entering ANSI leaving VT52 vice versa, its the same thing. those ancient docs didnt clearly say what mode activates, "previous" kind, probably DEC thing or ASCII thing. dont understand why any modern terminal user need that. should be always off

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Input Related to input processing (key presses, mouse, etc.) Issue-Task It's a feature request, but it doesn't really need a major design. Priority-3 A description (P3) Product-Conpty For console issues specifically related to conpty
Projects
None yet
Development

No branches or pull requests

8 participants