-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved function arity analysis #1397
Conversation
456af07
to
c09e350
Compare
I don't know how to read the timing in the description. What is it about ? Why is the "full analysis" faster that the "faster" one ? |
This is the timing of the generated code, when executing the following command:
So, a more precise analysis yields a significant gain. Compiling |
74f2fa8
to
fb90a0b
Compare
How is the size affected ? how many extra direct call are generated ? |
We could enable the full analysis with |
The code is about the same size. About 0.25% larger since we use a somewhat verbose notation for best performances: There are not that many additional extra direct calls but I think they are more likely to be performed repeatedly.
|
I've rebased your branch and make |
@vouillon, what do you want to do with this ? |
I've rebased the PR again |
53f2627
to
d960511
Compare
@vouillon, this is the last PR I'd like to sort out before the next release (see https://github.com/ocsigen/js_of_ocaml/milestone/10). |
@hhugo I think it is ready |
let apply_directly = J.call f params J.N in | ||
let apply_directly = | ||
(* Make sure we are performing a regular call, not a (slower) | ||
method call *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vouillon, can you provide some reference mentioning this optimization. I did some test and it seems to show the opposite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any reference, but without this, the improved analysis resulted in slower code.
I just tried is compiling ocamlc.byte
(dune exec -- js_of_ocaml --opt 3
which ocamlc.byte -o /tmp/ocamlc.js
).
And then running it on some ml source files:
time node /tmp/ocamlc.js -c ./benchmarks/sources/ml/*.ml ./benchmarks/sources/ml/*.ml ./benchmarks/sources/ml/*.ml
This optimization makes close to a 8% performance improvement on my machine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What ocaml version do you use ? it seems that ocamlc no longer works when compiled to js due to ocaml/ocaml#11997 (since OCaml 5.1). The reason is that Reloc_literal can now contain floats that jsoo doesn't want to marshal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried with Ocaml 4.14.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This optimization makes close to a 8% performance improvement on my machine.
Results are too noisy on my laptop but I can see some improvements indeed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using python3 -m pyperf system tune
to reduce the noise.
Addresses #594
The analysis is a bit costly, so we want to see whether this should be enabled by default.
There is also the option to make a coarser analysis (propagation information about what the functions returns but not through function parameters).