Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readline: Improve Unicode handling #25723

Closed
wants to merge 7 commits into from

Conversation

Avi-D-coder
Copy link
Contributor

Fixes: #25693

  • Makes delete and backspace work upon unicode code points instead of UTF-16 code units.
  • Prevents moving left or right from placing the cursor in between code units comprising a code point.

In plain english: Deleting and moving the cursor across emoji works now, but grapheme clusters/characters like 👨‍👩‍👧‍👦 still have trailing blank spaces.

Ideally grapheme clusters would be used instead of code points, but until the proposal-intl-segmenter lands node seems to be without built in support for handling them.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added the readline Issues and PRs related to the built-in readline module. label Jan 26, 2019
@BridgeAR
Copy link
Member

BridgeAR commented Jan 27, 2019

@Avi-D-coder thanks a lot! Would you be so kind to add further tests that verify the correct positions for these two cases:

    1. Insert an astral character
    2. Press right
    3. Add another astral character
    1. Insert an astral character
    2. Press left
    3. Add another astral character

I believe some of these cases could corrupt the input when looking at the code (AFAIK all String#slice commands need the character.length and not the actual cursor position to work properly. If I'm correct we'll likely have to track two things: a) the cursor position and b) the actual character length offset).
[Update] This was a wrong assumption since the cursor handling is still the same as before. [/Update]
[Update2] See #25723 (comment) for further details for the original issue. [/Update]

Please also use the following helper function to retrieve the number of characters in an string:

function characters(str) {
  let count = 0;
  // eslint-disable-next-line no-unused-vars
  for (const char of str) {
    count++;
  }
  return count;
}

In my mini benchmark this function outperforms Array.from(c).length by 20x.

@Avi-D-coder
Copy link
Contributor Author

@BridgeAR Will do

@BridgeAR
Copy link
Member

BridgeAR commented Jan 27, 2019

@Avi-D-coder you could also calculate the correct length offset with the cursor instead of tracking it as well. That would probably be easier (just a bit more CPU expensive but it should not hurt all that much).

function cursorToLength(cursor, str) {
  let len = 0;
  for (const char of str) {
    len += char.length;
    if (--cursor === 0)
      break;
  }
  return len;
}

This function could be used in all places that use the cursor position to slice something or that is compared to the line length.

Please also note that all occurrences of e.g. this.cursor = this.line.length also have to be updated (and they are almost everywhere).

@Avi-D-coder
Copy link
Contributor Author

Avi-D-coder commented Jan 29, 2019

@BridgeAR I added the tests and optimizations. I was not able to produce corrupt characters on this commit or the previous.

I renamed your characters function to codePointLen, since the concept of a character is not very well defined and tends to create issues (see the lsp range debacle).

I'm not entirely sure why cursorToLength is needed, but codePointLeftOf and codePointRightOf are I think a similar optimization.

Copy link
Member

@BridgeAR BridgeAR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good. I just left a few comments and ctrl + b and ctrl + f should also be changed accordingly.

lib/readline.js Show resolved Hide resolved
lib/readline.js Show resolved Hide resolved
lib/readline.js Outdated Show resolved Hide resolved
@Avi-D-coder
Copy link
Contributor Author

@BridgeAR I included your code.

@BridgeAR
Copy link
Member

BridgeAR commented Feb 6, 2019

LGTM, just ctrl + b (back one character) and ctrl + f (forward one character) should also be changed accordingly (we could do that in a separate PR but it feels natural to keep the functionality aligned).

node/lib/readline.js

Lines 836 to 837 in 91adbe1

case 'b': // back one character
this._moveCursor(-1);

node/lib/readline.js

Lines 840 to 841 in 91adbe1

case 'f': // forward one character
this._moveCursor(+1);

@Avi-D-coder
Copy link
Contributor Author

@BridgeAR my bad. I could have sworn I adjusted ctrl + f/b

lib/readline.js Outdated Show resolved Hide resolved
@BridgeAR
Copy link
Member

BridgeAR commented Feb 7, 2019

@Avi-D-coder thanks a lot for sticking to it! I just found the issue that confused me originally and it seems to be the last obstacle. Thanks again for reacting so quickly to everything! ❤️

lib/readline.js Outdated Show resolved Hide resolved
@BridgeAR
Copy link
Member

BridgeAR commented Feb 7, 2019

@BridgeAR BridgeAR added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Feb 7, 2019
@Avi-D-coder
Copy link
Contributor Author

Avi-D-coder commented Feb 8, 2019

Anyone know the reason for the inconsistent ci?

@lpinca
Copy link
Member

lpinca commented Feb 9, 2019

@Avi-D-coder
Copy link
Contributor Author

Do any changes need to be made? Why are those two ci environments failing?

@addaleax
Copy link
Member

@Avi-D-coder It looks like this is failing tests when Node.js is compiled with ./configure --without-intl?

https://ci.nodejs.org/job/node-test-commit-linux-containered/10565/nodes=ubuntu1604_sharedlibs_withoutintl_x64/testReport/junit/(root)/test/parallel_test_readline_interface/

assert.js:86
  throw new AssertionError(obj);
  ^

AssertionError [ERR_ASSERTION]: Expected values to be strictly equal:

1 !== 2

    at /home/iojs/build/workspace/node-test-commit-linux-containered/test/parallel/test-readline-interface.js:674:14
    at Array.forEach (<anonymous>)
    at Object.<anonymous> (/home/iojs/build/workspace/node-test-commit-linux-containered/test/parallel/test-readline-interface.js:87:17)
    at Module._compile (internal/modules/cjs/loader.js:735:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:746:10)
    at Module.load (internal/modules/cjs/loader.js:627:32)
    at tryModuleLoad (internal/modules/cjs/loader.js:570:12)
    at Function.Module._load (internal/modules/cjs/loader.js:562:3)
    at Function.Module.runMain (internal/modules/cjs/loader.js:798:12)
    at internal/main/run_main_module.js:29:11

@addaleax addaleax removed the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Feb 17, 2019
@Avi-D-coder
Copy link
Contributor Author

@addaleax This fix requires node to correctly handle unicode. I am not exactly sure what --without-intl does, but if no internationalization means not supporting unicode then these tests should not be run for --without-intl builds.

How can these tests be blacklisted for --without-intl builds.

@richardlau
Copy link
Member

@addaleax This fix requires node to correctly handle unicode. I am not exactly sure what --without-intl does, but if no internationalization means not supporting unicode then these tests should not be run for --without-intl builds.

How can these tests be blacklisted for --without-intl builds.

common.hasIntl.

@addaleax
Copy link
Member

@Avi-D-coder
Copy link
Contributor Author

The --without-intl build pases tests on my machine now. Could someone CI this again.

@vsemozhetbyt
Copy link
Contributor

lib/readline.js Outdated Show resolved Hide resolved
@richardlau
Copy link
Member

@Avi-D-coder
Copy link
Contributor Author

The failing tests are under a security embargo. What is the next step here?

@richardlau
Copy link
Member

The failing tests are under a security embargo. What is the next step here?

Waiting until the CI is unlocked after the security releases are done.

@BridgeAR BridgeAR added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Feb 26, 2019
@BridgeAR
Copy link
Member

BridgeAR commented Mar 1, 2019

Resumed CI https://ci.nodejs.org/job/node-test-pull-request/21093/ ✅ (besides Windows fanned).

Rebuild Windows fanned: https://ci.nodejs.org/job/node-test-commit-windows-fanned/25073/ ✔️

BridgeAR pushed a commit to BridgeAR/node that referenced this pull request Mar 1, 2019
Prevents moving left or right from placing the cursor in between code
units comprising a code point.

PR-URL: nodejs#25723
Fixes: nodejs#25693
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
@BridgeAR
Copy link
Member

BridgeAR commented Mar 1, 2019

Landed in fedc31b 🎉

@Avi-D-coder congratulations on your first commit to Node.js and a big thanks for sticking to it! ❤️ This was great work.

@BridgeAR BridgeAR closed this Mar 1, 2019
@Avi-D-coder
Copy link
Contributor Author

@BridgeAR thanks for all your help.

addaleax pushed a commit that referenced this pull request Mar 1, 2019
Prevents moving left or right from placing the cursor in between code
units comprising a code point.

PR-URL: #25723
Fixes: #25693
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Ruben Bridgewater <ruben@bridgewater.de>
@BridgeAR BridgeAR mentioned this pull request Mar 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. readline Issues and PRs related to the built-in readline module.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants