-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
src: simplify Javascript code embedding #27095
Conversation
* simplify js2c semantics
/CC @nodejs/c++ |
Can you elaborate on why? Encoding files without multibyte characters with one byte strings gives us some memory savings. |
Hopefully we can follow V8 and eliminate this completely in favor of a code snapshot. |
I believe there's still a difference when the code are parsed and stored in v8?
Um, I don't think so? At least they are not stored that way, and diff --git a/src/objects.cc b/src/objects.cc
index f27f067624..e6e20de9f3 100644
--- a/src/objects.cc
+++ b/src/objects.cc
@@ -5278,6 +5278,7 @@ Handle<Object> SharedFunctionInfo::GetSourceCode(
if (!shared->HasSourceCode()) return isolate->factory()->undefined_value();
Handle<String> source(String::cast(Script::cast(shared->script())->source()),
isolate);
+ DCHECK(!String::IsOneByteRepresentationUnderneath(*source));
return isolate->factory()->NewSubString(source, shared->StartPosition(),
shared->EndPosition());
}
While I generally agree with this, we usually trade simplicity for size savings and optimization, not the other way around, so someone could just reverse this patch in the name of saving size.
I don't think we can eliminate this completely? For one most builtins are not loaded during bootstrap and are not very likely to be used in normal use cases (e.g. scripts in deps). And a lot of builtins are not side-effect-free modules but scripts that can e.g. exit the process when executed so we cannot snapshot them. |
Maybe we should have benchmarks for binary sizes and RSS/heap sizes to determine the actual impact in cases like this.. |
I'll do that. As for code scanning, I was going by this: If I got it the wrong way around, we could (1) save everything as UTF-8 encoded As for size/speed AFAICT we tend to prefer speed over size (within reason of course): Line 202 in 2f1ed5c
|
BTW 657c979 is all the unicode code-point in our codebase |
I remember we explicitly leave multibyte characters in our code base to make sure our code compiles that way? (but I have no context anymore, maybe @Fishrock123 @bnoordhuis would know) |
@refack I believe in our case, the implementation that the article talks about translates to https://cs.chromium.org/chromium/src/v8/src/parsing/scanner-character-streams.cc?type=cs&q=BufferedCharacterStream&sq=package:chromium&g=0&l=250 ? |
Lets see what comes out of the startup benchmark: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/325/ |
From the benchmark |
File size has for Windows increased by 8%, so I'd say it's probably not worth it: D:\refael\Downloads>dir /s node.exe
Directory of D:\refael\Downloads\node-both-x64
2019-04-04 21:38 26,054,656 node.exe
Directory of D:\refael\Downloads\node-utf16-x64
2019-04-04 21:28 27,846,144 node.exe
Directory of D:\refael\Downloads\node-both.exe
2019-04-04 21:23 21,798,912 node.exe
Directory of D:\refael\Downloads\node-utf16.exe
2019-04-04 21:25 23,717,888 node.exe |
i'm not as knowledgeable as joyee is about the compile pipeline but moving to u16 without any significant performance improvement seems like a 👎 |
* use `string_view` instead of `UnionBytes`
I've taken this is a different direction. Replaced So this is blocked until we get gcc-6 level compilers on all platforms. |
That's correct, see #10673 for background. |
Yes, some code comments in timers use multi-byte characters (the ascii diagram). Those characters could be safely discarded in a build if necessary, I think. |
as long as the discard method preserves the proper string length (because of stack traces) this should be fine. |
8ae28ff
to
2935f72
Compare
This stalled and hasn't been moved forward in over a year. Closing but it can be reopened if it is picked back up again |
Based on #25518 so only 9a5b688 is new code
This simplifies how we embed the javascript into the binary:
uint16_t[]
(no need forUnionBytes
anymore)UInt16Span
that is independent of V8 (reduced the amount of definition need to be included)ToStringChecked
into a static class method that is the only place that createsUInt16SpanResource
Checklist
make -j4 test
(UNIX), orvcbuild test
(Windows) passes