Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance #1

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

tisztamo
Copy link

A possible fix of the missing performance gain.

julia> demo_sum()
Baseline:
  4.649 s (0 allocations: 0 bytes)
Catwalk defaults:
  4.210 s (5607369 allocations: 116.47 MiB)
Catwalk tuned:
  4.007 s (2775829 allocations: 55.60 MiB)

The problem was that

@inline next(rf::R_{Map}, result, input) = next(inner(rf), result, xform(rf).f(input))

was called before the Catwalked method of next, resulting in a non-jitted dynamic dispatch.

I am not sure though if what I did is reasonable in the larger context, but I hope you can fix it based on this.

Also, the default batch size was too small, so I have increased it to 1e6, which may be more than ideal, more tests are needed.

When testing with @btime, initial overhead should be small, but I see a small amount of compilation in every Catwalked run, thats why the tested runtimes have to be several seconds. I will check that, but I like to test cold runs with @time anyway, because Catwalk adds significant compiling overhead, and not measuring it seems unfair.

@tkf
Copy link
Member

tkf commented Mar 29, 2021

The problem was that

@inline next(rf::R_{Map}, result, input) = next(inner(rf), result, xform(rf).f(input))

was called before the Catwalked method of next, resulting in a non-jitted dynamic dispatch.

Oooh, I see. Unfortunately, the patch in this PR is not generic enough since, in general, we can't assume any structure outer/left to OptimizeInner() (inner/right, too). For example, it's reasonable to have

xs |> Filter(x -> x > 0) |> Map(type_instability) |> OptimizeInner() |> Map(asint)

(But it does help me understand the problem. Thanks!)

I'm not sure what's the best strategy, though. I think we need something like

@please_inline Transducers.next(rf::R_{OptimizeXF}, acc, @nospecialize(input)) = ...

in the Julia compiler to fully solve the problem; i.e., the compiler inlines this even though input cannot be inferred.

Meanwhile, maybe I should stop trying to support OptimizeInner(). Maybe it's still possible to support the case where the instability happens on the iterator side

(type_instability(x) for x in xs) |> Map(asint)
#                                    ----------
#                                    JIT'ed

@tisztamo
Copy link
Author

Yeah, I had the fear that simply eliminating that call is not the way to go...

About the compiler support: I struggle a lot with inlining, and the possibility to force it would result in measurable performance improvements in the original target of Catwalk.jl, but I never was brave enough to ask for it...

Forcing inlining from the call site seems a bit less risky in terms of accidental compilation overhead, and now we have a real use case. Do you think it is time to open an issue?

@tkf
Copy link
Member

tkf commented Mar 30, 2021

My guess is that many Julia programmers wished there was a forced/more controllable inlining macro at least once. I couldn't find it in the issue tracker, though, which is kinda strange. Maybe everyone assumed there is already one 😄 . So yeah, I think it'd be nice to have an issue for this.

@tisztamo
Copy link
Author

tisztamo commented Jun 28, 2021

Great news, @tkf : JuliaLang/julia#41328 allows forced inlining! I have tested this case (only on non-folds, non-catwalk sample code for now, I have package installation issues after compiling 1.8-dev).

@tkf
Copy link
Member

tkf commented Jun 29, 2021

Thanks! Yeah, that's great news, esp. for packages heavily depend on higher-order function like Transducers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants