Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved function arity analysis #1397

Merged
merged 8 commits into from
Mar 11, 2023
Merged

Improved function arity analysis #1397

merged 8 commits into from
Mar 11, 2023

Conversation

vouillon
Copy link
Member

@vouillon vouillon commented Feb 1, 2023

Addresses #594

The analysis is a bit costly, so we want to see whether this should be enabled by default.

There is also the option to make a coarser analysis (propagation information about what the functions returns but not through function parameters).

currently faster analysis full analysis
5.24 sec 5.19 sec: 1.01x faster 5.11 sec: 1.03x faster

@hhugo
Copy link
Member

hhugo commented Feb 2, 2023

@vouillon, I've rebased the branch and force-pushed after merging #1384

@hhugo
Copy link
Member

hhugo commented Feb 3, 2023

I don't know how to read the timing in the description. What is it about ? Why is the "full analysis" faster that the "faster" one ?

@vouillon
Copy link
Member Author

vouillon commented Feb 3, 2023

I don't know how to read the timing in the description. What is it about ? Why is the "full analysis" faster that the "faster" one ?

This is the timing of the generated code, when executing the following command:

node ocamlc.js -c ~/js_of_ocaml/benchmarks/sources/ml/*.ml

So, a more precise analysis yields a significant gain.

Compiling ocamlc.byte takes about 12s. The faster analysis adds about 0.35s. The full one adds about 0.8s.

@vouillon vouillon force-pushed the exact-calls branch 2 times, most recently from 74f2fa8 to fb90a0b Compare February 3, 2023 14:53
@hhugo
Copy link
Member

hhugo commented Feb 3, 2023

How is the size affected ? how many extra direct call are generated ?

@hhugo
Copy link
Member

hhugo commented Feb 5, 2023

We could enable the full analysis with -O3 only

@vouillon
Copy link
Member Author

vouillon commented Feb 7, 2023

How is the size affected ? how many extra direct call are generated ?

The code is about the same size. About 0.25% larger since we use a somewhat verbose notation for best performances:a[7].call(null, e1, ...,en). Indeed, a[7](e1, ..., en) would be interpreted as a method call on array a which is slower than using caml_calln. I tried patterns like (0||a[7])(...) or (0,a[7])(...) as a workaround, but they still seems slower than using call.

There are not that many additional extra direct calls but I think they are more likely to be performed repeatedly.

currently faster analysis full analysis
Code size 1912637 1917311 1917226
Direct calls 25883 +1409 + 1854

compiler/lib/specialize.ml Outdated Show resolved Hide resolved
compiler/lib/global_flow.ml Outdated Show resolved Hide resolved
@hhugo hhugo added this to the 5.1 milestone Feb 14, 2023
@hhugo
Copy link
Member

hhugo commented Feb 15, 2023

I've rebased your branch and make fast depend on profile (o1,o2,o3)

@hhugo
Copy link
Member

hhugo commented Feb 15, 2023

@vouillon, what do you want to do with this ?

@hhugo hhugo marked this pull request as ready for review February 22, 2023 01:15
@hhugo
Copy link
Member

hhugo commented Feb 22, 2023

I've rebased the PR again

@hhugo hhugo force-pushed the exact-calls branch 2 times, most recently from 53f2627 to d960511 Compare February 28, 2023 11:37
@hhugo
Copy link
Member

hhugo commented Mar 1, 2023

@vouillon, this is the last PR I'd like to sort out before the next release (see https://github.com/ocsigen/js_of_ocaml/milestone/10).
Should we merge or not ?

compiler/lib/global_flow.ml Outdated Show resolved Hide resolved
@hhugo hhugo removed this from the 5.1 milestone Mar 7, 2023
@vouillon
Copy link
Member Author

vouillon commented Mar 9, 2023

@hhugo I think it is ready

let apply_directly = J.call f params J.N in
let apply_directly =
(* Make sure we are performing a regular call, not a (slower)
method call *)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vouillon, can you provide some reference mentioning this optimization. I did some test and it seems to show the opposite.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any reference, but without this, the improved analysis resulted in slower code.
I just tried is compiling ocamlc.byte (dune exec -- js_of_ocaml --opt 3 which ocamlc.byte -o /tmp/ocamlc.js).
And then running it on some ml source files:

time node /tmp/ocamlc.js -c ./benchmarks/sources/ml/*.ml ./benchmarks/sources/ml/*.ml ./benchmarks/sources/ml/*.ml

This optimization makes close to a 8% performance improvement on my machine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What ocaml version do you use ? it seems that ocamlc no longer works when compiled to js due to ocaml/ocaml#11997 (since OCaml 5.1). The reason is that Reloc_literal can now contain floats that jsoo doesn't want to marshal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried with Ocaml 4.14.0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This optimization makes close to a 8% performance improvement on my machine.

Results are too noisy on my laptop but I can see some improvements indeed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using python3 -m pyperf system tune to reduce the noise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants