Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve some IntSet operations #23138

Merged
merged 1 commit into from
Aug 5, 2017
Merged

improve some IntSet operations #23138

merged 1 commit into from
Aug 5, 2017

Conversation

rfourquet
Copy link
Member

@rfourquet rfourquet commented Aug 5, 2017

  • IntSet has a new invariant: the length of the underlying BitArray
    is always a multiple of 64
  • the size is set to the minimum possible: the underlying routines
    are in charge of allocating enough (like for other containers);
    this can help to obtain the last element faster (which is useful
    e.g in rand(::IntSet))
  • _matched_map avoids extending a BitArray when unecessary
    (but doesn't work anymore for all 2^4 kinds of binary predicate,
    though it could be handled if needed in the future)

A case where this could be slower is for example in:

s = IntSet()
for i=1:1000
    push!(s, i)
end

But I could not really/reliably measure a difference (disclaimer: BenchmarkTools is broken recently on my installation, so I use a hand-made "equivalent"), and sizehint! can help here. The improvements for some operations can be huge, e.g. symdiff!(a, b) where a=IntSet([2^20]), b=IntSet([1]) decreases from 45.07µs to 10ns. Also, something like rand(s) where s=IntSet([1, 2, 3]) decreases from about 980ns to 310ns.

* IntSet has a new invariant: the length of the underlying BitArray
  is always a multiple of 64
* the size is set to the minimum possible: the underlying routines
  are in charge of allocating enough (like for other containers);
  this can help to obtain the last element faster (which is useful
  e.g in `rand(::IntSet)`)
* _matched_map avoids extending a BitArray when unecessary
  (but doesn't work anymore for all 2^4 kinds of binary predicate,
  though it could be handled if needed in the future)
@rfourquet rfourquet added the performance Must go faster label Aug 5, 2017
end
b
end
@inline in(n::Integer, s::IntSet) = get(s.bits, n, false)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change, but simpler.

@ararslan ararslan added the collections Data structures holding multiple items, e.g. sets label Aug 5, 2017
@ararslan
Copy link
Member

ararslan commented Aug 5, 2017

@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@JeffBezanson JeffBezanson merged commit 4dd7551 into master Aug 5, 2017
@JeffBezanson JeffBezanson deleted the rf/intset-optim branch August 5, 2017 23:52
@tkelman
Copy link
Contributor

tkelman commented Aug 6, 2017

stockcorr and array indexing regressions have a chance of being real, should keep an eye on the nightly results

@fredrikekre
Copy link
Member

Given ae17198#commitcomment-23501641 it might be worth looking into the array indexing regression?

@rfourquet
Copy link
Member Author

Looking at the very few places in base where IntSet are used, I have a hard time understanding how this change could affect these 2 tests... Also, I cannot reproduce locally, i.e. the regressions on those two tests, "stockcorr" and "sumvector_view", stay within 5% (for some reason, the @benchmark macro fails in my REPL, but running BaseBenchmarks is ok :-/ )

@tkelman
Copy link
Contributor

tkelman commented Aug 6, 2017

doesn't inference use them in some key places? otherwise they'd likely be moved out to DataStructures

@rfourquet
Copy link
Member Author

Inference uses them, but I thought it wouldn't influence the run time performance and the benchmarks.

rfourquet added a commit that referenced this pull request Sep 2, 2017
rfourquet added a commit that referenced this pull request Sep 2, 2017
Now IntSet have the length of their underlying BitVector be
a multiple of 64, so resizing them to something else breaks
this invariant.
JeffBezanson added a commit that referenced this pull request Sep 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
collections Data structures holding multiple items, e.g. sets performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants