-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crusher splits Unicode high/low surrogate pairs, producing incorrect strings #50
Comments
There are byteLength(str) solutions without use of encodeURI(), but that may not solve the problem as allowing unsupported Unicode sequences probably will cause problems on other parts of the source (the calling fn prob. should not produce these seq?). However, answer 7 from fuweichin on Jan 21 2016 provides an impl. where you can change the line throw new Error("UCS-2 String malformed") and try out what happens if this is ignored. http://stackoverflow.com/questions/5515869/string-length-in-bytes-in-javascript answer 7
|
The issue only happens with Unicode surrogate pairs. The crusher phase has no knowledge of Unicode constraints and is ignorant of high/low surrogate pairs, thus will not hesitate to break them (as does the naive string reverse function). This is what happens with your example : the string Replacing The solution would be to avoid dictionary strings starting in the middle of an astral character. Proposed fix, upon building the dictionary in |
Possible solution in ES6 : iterate String with for (const ch of 'x\uD83D\uDE80y') {
console.log(ch.length);
}
// Output:
// 1
// 2
// 1 |
… 0x10000) Tentative fix for #64 - supposed duplicate
Tweet from @magnitudoOrg :
Run under Chrome and FF, both fail in getByteLength (regPack.js:123) on error "malformed URI sequence". The original JSCrush suffers the same issue.
The text was updated successfully, but these errors were encountered: