Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterator performance #6137

Closed
filmackay opened this issue Mar 13, 2014 · 8 comments
Closed

Iterator performance #6137

filmackay opened this issue Mar 13, 2014 · 8 comments
Labels
performance Must go faster

Comments

@filmackay
Copy link

I'm finding that there is a large performance overhead with relatively simple iterators. I find that when I start modify state, I get into trouble by a couple of orders:

using Base

begin
    ### Iterator wrapper
    immutable type IterWrapper{I}
        iter::I
    end

    Base.start(s::IterWrapper) = start(s.iter)
    Base.next(s::IterWrapper, state) = next(s.iter, state)
    Base.done(s::IterWrapper, state) = done(s.iter, state)
end

begin
    ### Iterator add one: mutate the state

    immutable type IterAddOne{I}
        iter::I
    end

    Base.start(s::IterAddOne) = start(s.iter)

    Base.next(s::IterAddOne, state) = begin
        (value, state) = next(s.iter, state)
        (value + 1, state)
    end

    Base.done(s::IterAddOne, state) = done(s.iter, state)
end

begin
    ### Iterator state hack: reshape the state, but it's a noop

    immutable type IterStateHack{I}
        iter::I
    end

    Base.start(s::IterStateHack) = (0, start(s.iter))

    Base.next(s::IterStateHack, state) = begin
        (_, state) = state
        (value, state) = next(s.iter, state)
        (value, (0, state))
    end

    Base.done(s::IterStateHack, state) = begin
        (_, state) = state
        done(s.iter, state)
    end
end

begin
    ### Iterator stutter: repeats each item

    immutable type IterStutter{I}
        iter::I
    end

    Base.start(s::IterStutter) = (0, start(s.iter))

    Base.next(s::IterStutter, state) = begin
        (oldvalue, state) = state
        if oldvalue == 0
            (newvalue, state) = next(s.iter, state)
            oldvalue = newvalue
        else
            newvalue = oldvalue
            oldvalue = 0
        end
        (newvalue, (oldvalue, state))
    end

    Base.done(s::IterStutter, state) = begin
        (oldvalue, state) = state
        oldvalue != () && done(s.iter, state)
    end
end

macro run(call)
    quote
        # Warm-up first
        $call

        @time $call
    end
end

input = 1:1000000

println("*** raw")
@run sum(input)

println("*** IterWrapper")
@run sum(IterWrapper(input))

println("*** IterAddOne")
@run sum(IterAddOne(input))

println("*** IterStateHack")
@run sum(IterStateHack(input))

println("*** IterStutter")
@run sum(IterStutter(input))

Output:

*** raw
elapsed time: 2.414e-6 seconds (6716 bytes allocated)
*** IterWrapper
elapsed time: 0.00046161 seconds (88 bytes allocated)
*** IterAddOne
elapsed time: 0.002700015 seconds (88 bytes allocated)
*** IterStateHack
elapsed time: 0.22322219 seconds (207967480 bytes allocated)
*** IterStutter
elapsed time: 0.518856549 seconds (431926440 bytes allocated)

The raw v IterWrapper performance is understandable as the former is an AbstractArray. But the jump in allocation/time from IterWrapper to IterAddOne / IterStateHack..?

I'm essentially wanting to do far more complex iterators since I work with them rather than arrays. In some cases worse than Python, this leads me to think there's something wrong with my approach.

Thoughts?

@vtjnash
Copy link
Sponsor Member

vtjnash commented Mar 13, 2014

It seems like Jeff needs to merge #3796. Perhaps you can confirm?

@andrewcooke
Copy link
Contributor

fwiw i saw something like this with enumerate yesterday (i failed to reproduce it in a simple test and moved on, but it fits the pattern above).

@lindahua
Copy link
Contributor

I thought #3796 might help.

@filmackay
Copy link
Author

Thanks @vtjnash - just reading the (long) history of this baby. Should I make changes to my code, or should the inlining changes just work?

@JeffBezanson
Copy link
Sponsor Member

It probably would, but the biggest issue is probably allocating tuples. We will have to improve that; you can't just inline everything into everything.

@filmackay
Copy link
Author

@JeffBezanson picked it. Small improvement (maybe).

Are there any changes afoot for tuple allocation? Is this an optimisation/fusing problem though - this seems to be delegated to LLVM currently.

FYI:

*** raw
elapsed time: 2.518e-6 seconds (6680 bytes allocated)
*** IterWrapper
elapsed time: 0.00028789 seconds (88 bytes allocated)
*** IterAddOne
elapsed time: 0.003181837 seconds (88 bytes allocated)
*** IterStateHack
elapsed time: 0.212021775 seconds (207967480 bytes allocated)
*** IterStutter
elapsed time: 0.509278007 seconds (431926440 bytes allocated)

@filmackay
Copy link
Author

Updated performance: IterWrapper has been optimised away (yay) but others have not moved:

*** raw
elapsed time: 2.589e-6 seconds (6904 bytes allocated)
*** IterWrapper
elapsed time: 2.573e-6 seconds (88 bytes allocated)
*** IterAddOne
elapsed time: 0.002297095 seconds (88 bytes allocated)
*** IterStateHack
elapsed time: 0.189266422 seconds (239942952 bytes allocated)
*** IterStutter
elapsed time: 0.469709382 seconds (527861032 bytes allocated)

@mbauman
Copy link
Sponsor Member

mbauman commented Apr 18, 2015

This has improved. Note that there was a type instability in your IterStateHack and IterStutter methods. I resolved those by using a different name for the argument, e.g.,:

    Base.next(s::IterStutter, st) = begin
        (oldvalue, state) = st
        ...

With those changes, this benchmark now shows:

*** raw
elapsed time: 6.749e-6 seconds (192 bytes allocated)
*** IterWrapper
elapsed time: 6.458e-6 seconds (224 bytes allocated)
*** IterAddOne
elapsed time: 9.006e-6 seconds (224 bytes allocated)
*** IterStateHack
elapsed time: 0.003326626 seconds (224 bytes allocated)
*** IterStutter
elapsed time: 0.008231482 seconds (224 bytes allocated)

LLVM now completely removes the benchmarking loops on all but the last two iterators. I think this can probably be closed — even in the cases where LLVM doesn't totally remove the loop the performance is quite reasonable for summing 1000000 elements (~5ns per loop).

@mbauman mbauman closed this as completed Apr 18, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

6 participants