Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Apply functions: optimizations #1358

Merged
merged 6 commits into from
Jan 13, 2023
Merged

Apply functions: optimizations #1358

merged 6 commits into from
Jan 13, 2023

Conversation

vouillon
Copy link
Member

@vouillon vouillon commented Dec 15, 2022

Getting the arity of a function is slow with v8. On Chrome, function p is about ten times slower than function q:

function f (x) {}
f.l = 1;
var s = 0;
function p() {for (i = 0; i < 100000000; i++) { ((f.length)==1?s+=1:0)} }
function q(){for (i = 0; i < 100000000; i++) { ((f.l)==1?s+=1:0)}}
p(); q()

image

node takes 3.1s to execute this piece of code without this optimization and 1.3s with it (and 0,8s when manually removing the call to caml_call1).

let l = List.of_seq (Seq.init 10000 (fun i -> i))
let s = ref 0
let f x = s := !s + x
let iter f  = for i = 1 to 10000 do List.iter f l; s := 0; List.iter f l done
let () = iter f

I have known about this issue for a long time. I was considering wrapping functions inside an object: {arity:2,fun:function(...){..}}, with some optimization for when the function is statically known. But that would had been a large change with some impact on the generated code size and of lot of added complexity in the Js_of_ocaml compiler. This change seems to address the issue at a very low cost.

Related issue: #1246

@hhugo
Copy link
Member

hhugo commented Dec 16, 2022

Is this draft on purpose ? what's the plan ?

@pmwhite
Copy link
Contributor

pmwhite commented Dec 16, 2022

f.length is slow with v8, so we cache the function arity in another property

Also, I'd be very interested in seeing some documentation of this aspect of v8. How do you know f.length is slow?

@vouillon
Copy link
Member Author

Is this draft on purpose ? what's the plan ?

I would like to do more benchmarking.

@vouillon
Copy link
Member Author

Also, I'd be very interested in seeing some documentation of this aspect of v8. How do you know f.length is slow?

I have not seen any documentation on this. I just noticed that somehow. This was also reported in #1246.

@TyOverby
Copy link
Collaborator

TyOverby commented Dec 17, 2022

I pulled this change into our repo and saw a huge improvement for microbenchmarks (50% runtime reduction in some cases) but much smaller (2% - 3%) improvement in our benchmarks that attempt to replicate real-world applications and workloads. I suspect that this is because the fallback case in caml_call_gen returns a function that has a length of 0, which means that partially-applied functions don't benefit from length caching.

I suspect that if caml_callgen checked to see how many arguments are left unapplied and returned functions specialized to that size, then the the arity-cache proposed here would be hit far more often.

@TyOverby
Copy link
Collaborator

partially-applied functions don't benefit from length caching

This is because if f.l == 0 then the first argument to the || operator in f.l || f.l = f.length will evaluate to false, causing the right-hand-side of the operator to be evaluated unconditionally.

@hhugo
Copy link
Member

hhugo commented Dec 19, 2022

not sure about the performance, but one could use a shorter sytnax

f.l >= 0 ? f.t : f.t=f.lenght

instead of

f.l === underfined ? f.l = f.lenght : f.l

runtime/stdlib.js Outdated Show resolved Hide resolved
runtime/stdlib_modern.js Outdated Show resolved Hide resolved
@hhugo
Copy link
Member

hhugo commented Dec 22, 2022

caml_js_function_arity should probably be updated as well to use (and set ?) the cache.

@hhugo
Copy link
Member

hhugo commented Dec 22, 2022

function caml_call_gen(f, args) {
  if(f.fun)
    return caml_call_gen(f.fun, args);
  //FIXME, can happen with too many arguments
  if(typeof f !== "function") return f;
  var n = f.length | 0;
...

Shouldn't we also use the cache for computing n ?

@TyOverby
Copy link
Collaborator

TyOverby commented Jan 1, 2023

I pulled this PR in for prototyping and added @hhugo's suggestion

-  var n = f.length | 0;
+  var n = (f.l >= 0 ? f.l : f.l = f.length) | 0;

And the benchmarks I mentioned

benchmarks that attempt to replicate real-world applications and workloads saw smaller (2% - 3%) improvements

saw improvements in the 10% to 30% range!

@vouillon
Copy link
Member Author

vouillon commented Jan 2, 2023

I pulled this PR in for prototyping and added @hhugo's suggestion

-  var n = f.length | 0;
+  var n = (f.l >= 0 ? f.l : f.l = f.length) | 0;

And the benchmarks I mentioned

benchmarks that attempt to replicate real-world applications and workloads saw smaller (2% - 3%) improvements

saw improvements in the 10% to 30% range!

@TyOverby Thanks a lot for your tests! I'm surprised this makes that much of a difference since this is on a slow path. Maybe that's the optimization for partially applied functions (33c3af8) that makes such a difference?

@vouillon vouillon changed the title Apply functions: cache function arity Apply functions: optimizations Jan 2, 2023
@TyOverby
Copy link
Collaborator

TyOverby commented Jan 3, 2023

I'm surprised this makes that much of a difference since this is on a slow path. Maybe that's the optimization for partially applied functions (33c3af8) that makes such a difference?

I totally agree that it's almost all due to returning functions with specialized arity when partially applied. I just wanted to mention that I cherry-picked a change that wasn't already in the PR.

@TyOverby
Copy link
Collaborator

TyOverby commented Jan 4, 2023

I traced a few apps and found this distribution for the caml_call{n} functions

n occurances
1 109
2 300
3 30
4 4
5 <1

Based on this, I think I'm happy with only building specialized partially-applied functions for the the cases where only 1 or 2 args remain.

@hhugo
Copy link
Member

hhugo commented Jan 12, 2023

@vouillon, should we move this PR forward ?

@vouillon vouillon marked this pull request as ready for review January 12, 2023 17:23
@hhugo
Copy link
Member

hhugo commented Jan 13, 2023

I've rebased the PR, fix tests and rewrote history

@hhugo hhugo merged commit e874f26 into master Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants