Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

utf8 problem in _third_party_main.js #10673

Closed
Tunga37 opened this issue Jan 7, 2017 · 8 comments
Closed

utf8 problem in _third_party_main.js #10673

Tunga37 opened this issue Jan 7, 2017 · 8 comments
Labels
build Issues and PRs related to build files or the CI. lib / src Issues and PRs related to general changes in the lib or src directory.

Comments

@Tunga37
Copy link

Tunga37 commented Jan 7, 2017

  • Version: 7.1 ... 7.4
  • Platform: 4.4.0-57-generic Mani #78-Ubuntu SMP Fri Dec 9 23:50:32 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
  • Subsystem:

Since version 7.1 (node.js) there is a problem with utf8 chars, but only when you put your code into the file lib/_third_party_main.js (this path has to be included in node.gyp)

let say _third_party_main.js looks like this

(function () {
"use strict";
console.log("pchnąć w tę łódź jeża lub ośm skrzyń fig"); //some polish words
}());

if you then run simple ./configure and make

and after long successful process of compilation you will just invoke ./node

then you will get

pchn�� w t� �ódź jeża lub o�m skrzy� fig

utf8 has been destroyed

@mscdex mscdex added the build Issues and PRs related to build files or the CI. label Jan 7, 2017
@aqrln
Copy link
Contributor

aqrln commented Feb 2, 2017

I can confirm this. It was introduced in #5458. @bnoordhuis, does V8 require two-byte external strings to be aligned? It would be nice to have a ./configure option to switch between one-byte external strings (which would be the default since there are only ASCII characters in lib) and either external two-byte strings or UTF-8 strings in managed memory for those who build desktop applications with Node as a single binary (or whatever else the use case may be). I can open a PR for that.

@joyeecheung joyeecheung added the lib / src Issues and PRs related to general changes in the lib or src directory. label Feb 2, 2017
@bnoordhuis
Copy link
Member

@Tunga37 @aqrln Can you check if #11129 works for you?

@aqrln
Copy link
Contributor

aqrln commented Feb 2, 2017

@bnoordhuis should work, except (probably) endianness on some architectures. Basically I just did the same thing but with a ./configure option saved in a gyp variable and passed into js2c.py and it worked pretty well. Though I like your approach with auto-detection more.

@Kamill90
Copy link

Kamill90 commented Feb 6, 2017

Hi, i need to load csv file with polish symbol. I have tried every praser i found. Always same result as @Tunga37. Have you managed to solve this issue? U can contact me https://www.facebook.com/kamil.lewandowski.3532507 or here. I would appreciate.

@aqrln
Copy link
Contributor

aqrln commented Feb 6, 2017

@Kamill90 this is definitely not the issue that @Tunga37 reported, and not related to Node at all, actually, just the libraries you have tried or that how you use them. CSV parsing happens in userland while this issue is about core libraries compiled into Node's binary. You can ask for help in https://github.com/nodejs/help/issues.

@Tunga37
Copy link
Author

Tunga37 commented Feb 6, 2017

@bnoordhuis your fix doesn't work for me because I work on node7.5 when this fix is for your version of io.js ( at least I think so)
The problem is definitely in file src/node_javascript.cc

Local<String> MainSource(Environment* env) {
  auto maybe_string =
      String::NewExternalOneByte(
          env->isolate(),
          &internal_bootstrap_node_external_data);
  return maybe_string.ToLocalChecked();
}

when it used to be

Local<String> MainSource(Environment* env) {
  return String::NewFromUtf8(
      env->isolate(),
      reinterpret_cast<const char*>(internal_bootstrap_node_native),
      NewStringType::kNormal,
      sizeof(internal_bootstrap_node_native)).ToLocalChecked();
}

@Kamill90
It's exactly what @aqrln said. Your problem is probably not connected with issue which I reported. Unless you are trying to build in this csv mechanism into the source of node.js's binaries.

@jasnell
Copy link
Member

jasnell commented Feb 6, 2017

@Tunga37 the fix being put forward by @bnoordhuis would land in Node.js master and would be picked back into our v7.x branch with an upcoming 7.x release.

@Tunga37
Copy link
Author

Tunga37 commented Feb 6, 2017

Thank You All

@Tunga37 Tunga37 closed this as completed Feb 13, 2017
bnoordhuis added a commit to bnoordhuis/io.js that referenced this issue Feb 13, 2017
Detect it when source files in lib/ are not ASCII.  Decode them as UTF-8
and store them as UTF-16 in the binary so they can be used as external
string resources without non-ASCII characters getting mangled.

Fixes: nodejs#10673
PR-URL: nodejs#11129
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
italoacasas pushed a commit to italoacasas/node that referenced this issue Feb 14, 2017
Detect it when source files in lib/ are not ASCII.  Decode them as UTF-8
and store them as UTF-16 in the binary so they can be used as external
string resources without non-ASCII characters getting mangled.

Fixes: nodejs#10673
PR-URL: nodejs#11129
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
jasnell pushed a commit that referenced this issue Mar 7, 2017
Detect it when source files in lib/ are not ASCII.  Decode them as UTF-8
and store them as UTF-16 in the binary so they can be used as external
string resources without non-ASCII characters getting mangled.

Fixes: #10673
PR-URL: #11129
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
MylesBorins pushed a commit that referenced this issue Mar 9, 2017
Detect it when source files in lib/ are not ASCII.  Decode them as UTF-8
and store them as UTF-16 in the binary so they can be used as external
string resources without non-ASCII characters getting mangled.

Fixes: #10673
PR-URL: #11129
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Issues and PRs related to build files or the CI. lib / src Issues and PRs related to general changes in the lib or src directory.
Projects
None yet
Development

No branches or pull requests

7 participants