Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt fixing repetitive chars in Japanese IME #15938

Closed
wants to merge 1 commit into from

Conversation

lhecker
Copy link
Member

@lhecker lhecker commented Sep 7, 2023

Users report in issue #14349 that the Japanese IME may print repetitive
characters since v1.15. The two potential offenders are:

I don't think either of the two commits themselves is "buggy", but
rather that they make bugs in TSF obvious. This is because backspacing
causes TermControl::_CharacterHandler to be called, which will then
call TSFInputControl().ClearBuffer(), which should instruct TSF
to flush its entire buffer and reset its state.
That doesn't seem to be happening though.

I think what's happening here is that TSF is either:

  • ignoring this request
  • getting into a bad state
  • running into race conditions due to running out-of-process

2c922e1 could have triggered the
regression because it drops clamping the signed range values from TSF.
I believe at the time I didn't consider it possible that TSF would
actually pass either negative indices or a range where
StartCaretPosition is greater than EndCaretPosition.

ed800dc could have triggered the
regression because it changed this (simplified):

_activeTextStart = min(_activeTextStart, range.StartCaretPosition);

into this:

_activeTextStart = min(_activeTextStart, _inputBuffer.size());

I think the latter is technically more correct, but if TSF is in a bad
state where it didn't properly reset, then this may be off.
(When the buffer contains "foo" and the _activeTextStart is 3 (= end),
then we shouldn't move the start to 0 just because TSF replaced the
entire string with "foobar", since the active part is just "bar".)

Closes #14349

Validation Steps Performed

  • Japanese IME (Full-Width Katakana)
    Typing "saitama" produces "サイタマ" ✅
  • Korean IME
    Typing "gksrmf" produces "한글" ✅
  • Vietnamese IME
    Typing "xin chaof" produces "xin chào" ✅
  • Emoji Picker (Win+.)

@microsoft-github-policy-service microsoft-github-policy-service bot added Issue-Bug It either shouldn't be doing this or needs an investigation. Area-Input Related to input processing (key presses, mouse, etc.) Product-Terminal The new Windows Terminal. labels Sep 7, 2023
@lhecker lhecker added this to the Terminal v1.19 milestone Sep 7, 2023
range.EndCaretPosition - range.StartCaretPosition,
incomingText);
const auto [beg, len] = _convertRangeIntoBeginAndLength(range);
_inputBuffer.replace(beg, len, incomingText);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace() returns a reference to *this, so assigning it to itself is not necessary.
(https://en.cppreference.com/w/cpp/string/basic_string/replace)

Comment on lines +399 to +408
std::pair<size_t, size_t> TSFInputControl::_convertRangeIntoBeginAndLength(CoreTextRange range) const noexcept
{
auto beg = gsl::narrow_cast<size_t>(std::max(0, range.StartCaretPosition));
auto end = gsl::narrow_cast<size_t>(std::max(0, range.EndCaretPosition));

end = std::min(end, _inputBuffer.size());
beg = std::min(beg, end);

return { beg, end - beg };
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI the ::base::ClampMin, etc., code in this class prior to 2c922e1 was not entirely correct either. For instance, it would pass range.StartCaretPosition to replace() directly.

Code like this:

const auto textEnd = ::base::ClampMin<size_t>(range.EndCaretPosition, _inputBuffer.length());
const auto length = ::base::ClampSub<size_t>(textEnd, range.StartCaretPosition);

is also not correct, since the T in ::base::ClampSub<T> acts just like the T in std::min<T>: It casts all parameters to <T> and then clamps the final value. That is, if you do ClampSub<size_t>(123, -1) it is the same as ClampSub(size_t(123), size_t(-1)) which is 18446744073709551615. In other words, a negative StartCaretPosition will result in length == SIZE_T_MAX (substr() doesn't really care, but... eh).

_selection = args.NewSelection();
// GH#5054: Pressing backspace might move the caret before the _activeTextStart.
_activeTextStart = std::min(_activeTextStart, _inputBuffer.size());
_activeTextStart = std::min(_activeTextStart, beg);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reverts part of ed800dc. It still seems to work.

@lhecker
Copy link
Member Author

lhecker commented Sep 7, 2023

There's one more part that worries me about 2c922e1:
2c922e1#diff-1385b1ccad3b8d0ca82e24b8eb7ec11431204e3db23168e9ac620aa2476eb48fL124-R106

image

It works on my PC™️, but I'm worried that TSF has overflow bugs. I did this change, because back then I was already worried that TSF may go out of sync with our state. My hope was that this would reset TSF's internal text state, no matter what TSF thinks how much text we got.

@lhecker lhecker marked this pull request as draft September 7, 2023 15:23
@lhecker lhecker closed this Feb 21, 2024
@microsoft-github-policy-service microsoft-github-policy-service bot added the Issue-Task It's a feature request, but it doesn't really need a major design. label Feb 21, 2024
@lhecker lhecker deleted the dev/lhecker/14349-tsf-fixup branch August 5, 2024 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Input Related to input processing (key presses, mouse, etc.) Issue-Bug It either shouldn't be doing this or needs an investigation. Issue-Task It's a feature request, but it doesn't really need a major design. Product-Terminal The new Windows Terminal.
Projects
Status: To Cherry Pick
Development

Successfully merging this pull request may close these issues.

Terminal hangs or prints repetitive characters when using Japanese IME
1 participant