-
Notifications
You must be signed in to change notification settings - Fork 7.3k
Buffer Incorrectly Encodes "\0" as 32 for ASCII Encoding #297
Comments
The reason for it, is this line: http://github.com/ry/node/blob/9922e4e433996722a76edb46d14f1729f33b4bed/deps/v8/src/api.cc#L3005 which I am not sure the purpose of... I'll ask in v8-users. |
Side note: the behaviour for UTF-8 has changed since this issue was raised.
Perhaps we should add this as a caveat emptor to the docs and tell people to pass non-printable characters as an array instead of a string, e.g. [0] instead of "\0" or "\u0000"? |
How about d39f23a? |
I am so confused. Don't we have two bugs now? The "ascii" encoding converts '\0' to 32, which is a space, ' ', and 'utf8' converts to Why does the character with the code The behavior is inconsistent, seemingly arbitrary:
What is the reasoning behind any of it? |
Wow. That is really, really bad. Who is implementing this upstream? Can we implement a correct encoding in Node.js and skip the capricious V8 implementations? They are not right and there is no justification for why they are the way they are. Why does Node.js show deference to them? Maybe in Chrome these encodings don't matter, but Node.js is for network programming, and for network programming, correct and fast encoding implementations matter. |
I just noticed: @koichik ~ The If it is written in C it won't be slower. |
@bigeasy - I think that Node can not access string's data directly, Node needs copy twice. |
@koichik - Icky. I did post an inquiry a while back on the V8 mailing list. Maybe we can patch V8 and lobby for the patch inclusion? My guess is that the It will probably take some time to figure out how to argue the point. It seems obvious to me that decoding should be agnostic about the string implementation. It is probably very logical to them that |
To reproduce:
Correct for UTF-8:
For reference, here is an ASCII table.
The text was updated successfully, but these errors were encountered: