Iterator performance #6137

filmackay · 2014-03-13T02:00:00Z

I'm finding that there is a large performance overhead with relatively simple iterators. I find that when I start modify state, I get into trouble by a couple of orders:

using Base

begin
    ### Iterator wrapper
    immutable type IterWrapper{I}
        iter::I
    end

    Base.start(s::IterWrapper) = start(s.iter)
    Base.next(s::IterWrapper, state) = next(s.iter, state)
    Base.done(s::IterWrapper, state) = done(s.iter, state)
end

begin
    ### Iterator add one: mutate the state

    immutable type IterAddOne{I}
        iter::I
    end

    Base.start(s::IterAddOne) = start(s.iter)

    Base.next(s::IterAddOne, state) = begin
        (value, state) = next(s.iter, state)
        (value + 1, state)
    end

    Base.done(s::IterAddOne, state) = done(s.iter, state)
end

begin
    ### Iterator state hack: reshape the state, but it's a noop

    immutable type IterStateHack{I}
        iter::I
    end

    Base.start(s::IterStateHack) = (0, start(s.iter))

    Base.next(s::IterStateHack, state) = begin
        (_, state) = state
        (value, state) = next(s.iter, state)
        (value, (0, state))
    end

    Base.done(s::IterStateHack, state) = begin
        (_, state) = state
        done(s.iter, state)
    end
end

begin
    ### Iterator stutter: repeats each item

    immutable type IterStutter{I}
        iter::I
    end

    Base.start(s::IterStutter) = (0, start(s.iter))

    Base.next(s::IterStutter, state) = begin
        (oldvalue, state) = state
        if oldvalue == 0
            (newvalue, state) = next(s.iter, state)
            oldvalue = newvalue
        else
            newvalue = oldvalue
            oldvalue = 0
        end
        (newvalue, (oldvalue, state))
    end

    Base.done(s::IterStutter, state) = begin
        (oldvalue, state) = state
        oldvalue != () && done(s.iter, state)
    end
end

macro run(call)
    quote
        # Warm-up first
        $call

        @time $call
    end
end

input = 1:1000000

println("*** raw")
@run sum(input)

println("*** IterWrapper")
@run sum(IterWrapper(input))

println("*** IterAddOne")
@run sum(IterAddOne(input))

println("*** IterStateHack")
@run sum(IterStateHack(input))

println("*** IterStutter")
@run sum(IterStutter(input))

Output:

*** raw
elapsed time: 2.414e-6 seconds (6716 bytes allocated)
*** IterWrapper
elapsed time: 0.00046161 seconds (88 bytes allocated)
*** IterAddOne
elapsed time: 0.002700015 seconds (88 bytes allocated)
*** IterStateHack
elapsed time: 0.22322219 seconds (207967480 bytes allocated)
*** IterStutter
elapsed time: 0.518856549 seconds (431926440 bytes allocated)

The raw v IterWrapper performance is understandable as the former is an AbstractArray. But the jump in allocation/time from IterWrapper to IterAddOne / IterStateHack..?

I'm essentially wanting to do far more complex iterators since I work with them rather than arrays. In some cases worse than Python, this leads me to think there's something wrong with my approach.

Thoughts?

The text was updated successfully, but these errors were encountered:

vtjnash · 2014-03-13T02:48:28Z

It seems like Jeff needs to merge #3796. Perhaps you can confirm?

andrewcooke · 2014-03-13T03:03:00Z

fwiw i saw something like this with enumerate yesterday (i failed to reproduce it in a simple test and moved on, but it fits the pattern above).

lindahua · 2014-03-13T03:43:50Z

I thought #3796 might help.

filmackay · 2014-03-13T03:51:06Z

Thanks @vtjnash - just reading the (long) history of this baby. Should I make changes to my code, or should the inlining changes just work?

JeffBezanson · 2014-03-13T03:51:45Z

It probably would, but the biggest issue is probably allocating tuples. We will have to improve that; you can't just inline everything into everything.

filmackay · 2014-03-13T04:13:12Z

@JeffBezanson picked it. Small improvement (maybe).

Are there any changes afoot for tuple allocation? Is this an optimisation/fusing problem though - this seems to be delegated to LLVM currently.

FYI:

*** raw
elapsed time: 2.518e-6 seconds (6680 bytes allocated)
*** IterWrapper
elapsed time: 0.00028789 seconds (88 bytes allocated)
*** IterAddOne
elapsed time: 0.003181837 seconds (88 bytes allocated)
*** IterStateHack
elapsed time: 0.212021775 seconds (207967480 bytes allocated)
*** IterStutter
elapsed time: 0.509278007 seconds (431926440 bytes allocated)

filmackay · 2014-05-13T05:54:26Z

Updated performance: IterWrapper has been optimised away (yay) but others have not moved:

*** raw
elapsed time: 2.589e-6 seconds (6904 bytes allocated)
*** IterWrapper
elapsed time: 2.573e-6 seconds (88 bytes allocated)
*** IterAddOne
elapsed time: 0.002297095 seconds (88 bytes allocated)
*** IterStateHack
elapsed time: 0.189266422 seconds (239942952 bytes allocated)
*** IterStutter
elapsed time: 0.469709382 seconds (527861032 bytes allocated)

mbauman · 2015-04-18T19:46:46Z

This has improved. Note that there was a type instability in your IterStateHack and IterStutter methods. I resolved those by using a different name for the argument, e.g.,:

    Base.next(s::IterStutter, st) = begin
        (oldvalue, state) = st
        ...

With those changes, this benchmark now shows:

*** raw
elapsed time: 6.749e-6 seconds (192 bytes allocated)
*** IterWrapper
elapsed time: 6.458e-6 seconds (224 bytes allocated)
*** IterAddOne
elapsed time: 9.006e-6 seconds (224 bytes allocated)
*** IterStateHack
elapsed time: 0.003326626 seconds (224 bytes allocated)
*** IterStutter
elapsed time: 0.008231482 seconds (224 bytes allocated)

LLVM now completely removes the benchmarking loops on all but the last two iterators. I think this can probably be closed — even in the cases where LLVM doesn't totally remove the loop the performance is quite reasonable for summing 1000000 elements (~5ns per loop).

JeffBezanson added the performance label Mar 15, 2014

mbauman closed this as completed Apr 18, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iterator performance #6137

Iterator performance #6137

filmackay commented Mar 13, 2014

vtjnash commented Mar 13, 2014

andrewcooke commented Mar 13, 2014

lindahua commented Mar 13, 2014

filmackay commented Mar 13, 2014

JeffBezanson commented Mar 13, 2014

filmackay commented Mar 13, 2014

filmackay commented May 13, 2014

mbauman commented Apr 18, 2015

Iterator performance #6137

Iterator performance #6137

Comments

filmackay commented Mar 13, 2014

vtjnash commented Mar 13, 2014

andrewcooke commented Mar 13, 2014

lindahua commented Mar 13, 2014

filmackay commented Mar 13, 2014

JeffBezanson commented Mar 13, 2014

filmackay commented Mar 13, 2014

filmackay commented May 13, 2014

mbauman commented Apr 18, 2015