Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

isLength fails with some emojis #1941

Closed
cancerberoSgx opened this issue Mar 18, 2022 · 5 comments
Closed

isLength fails with some emojis #1941

cancerberoSgx opened this issue Mar 18, 2022 · 5 comments

Comments

@cancerberoSgx
Copy link

isLength fail with some emojis.

Examples

const validator = require('validator')
console.log('Failing with emojis', validator.isLength('πŸ‘©πŸ¦°πŸ‘©πŸ‘©πŸ‘¦πŸ‘¦πŸ³οΈπŸŒˆ', {min: 1, max: 8}));
// false
console.log('OK without emojis', validator.isLength('12345678', {min: 1, max: 8}));
// true

Additional context
Validator.js version: latest
Node.js version: 14.19.0
OS platform: macOS

@cancerberoSgx
Copy link
Author

cancerberoSgx commented Mar 18, 2022

BTW: tried some other implementations and the only one I found correct for this string is lodash.toArray - IMO using this same implementation should solve the issue:

const _ = require('lodash')

const s1 = 'πŸ‘©πŸ¦°πŸ‘©πŸ‘©πŸ‘¦πŸ‘¦πŸ³οΈπŸŒˆ'
console.log('lodash toArray', _.toArray(s1).length) // 8
console.log('array from', Array.from(s1).length) // 9
console.log('array from', s1.match(/./gu).length) // 9
console.log('destructuring', [...s1].length) // 9

PD: underscore.toArray didn't work.

@WikiRik
Copy link
Member

WikiRik commented Mar 21, 2022

Do you also know why it fails?

@cancerberoSgx
Copy link
Author

cancerberoSgx commented Mar 22, 2022

My "why" was in the lodash tip ;) Probably in the lines of :
https://github.com/lodash/lodash/blob/master/.internal/unicodeToArray.js

other useful links
https://github.com/lodash/lodash/blob/master/.internal/hasUnicode.js
https://github.com/lodash/lodash/blob/master/toArray.js

Sorry don't have much time for a PR right now :(

BTW: I'm actually using express-validator library who relies on this - will end up using a custom validator in the meanwhile

@nick-cd
Copy link

nick-cd commented Mar 23, 2022

I'm a beginner at Unicode. I just want to add in my thoughts to test my knowledge and help out :). In case I mention anything inaccurate, I apologize in advance

Do you also know why it fails?

It seems like 🏳️ is the offending character in that string. Note that it is not the same as 🏳, which would not have caused this issue.

The offending flag char consists of three distinct escape sequences. Specifically, it has a:

Thus, it is not simply an astral symbol but a grapheme cluster. The current implementation of isLength() only considers the astral plane code points (the surrogate halves). As a result, isLength() counts the stray non-spacing combination mark as an additional character, which poses this problem.

@rubiin
Copy link
Member

rubiin commented Jul 18, 2022

#1967 merged, watch out for next version

@rubiin rubiin closed this as completed Jul 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants