Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch overloads using an array pool #856

Merged
merged 35 commits into from
Nov 13, 2022
Merged

Conversation

atifaziz
Copy link
Member

@atifaziz atifaziz commented Oct 29, 2022

This PR adds Batch overloads using an array pool. It provides an alternative approach to PR #833 by @leandromoh.

The new overloads accept an ArrayPool<> argument, which is then used to rent an array to back each bucket. The rented array is returned to the pool before proceeding with the next bucket, for which a new rental is made. Chances are good that the same array ends up backing each bucket.

The overloads also introduce ICurrentBuffer<T>, which is effectively a IList<T> but one that gets updated in-place, thus bringing some notion of currency. An ICurrentBuffer<T> is only valid during a Batch operation and is never meant to escape or be used outside of the invocation (via side-effects). Doing so will lead to undefined behaviour. The reason for not using a plain IList<T> is to make all this painfully obvious through the signatures.

In terms of usage, it's fairly straightforward. Suppose the following use of Batch (regular, non-pooled) to sum a million numbers in chunks of 10,000:

var q =
    from b in MoreEnumerable.Generate(1L, x => x + 1)
                            .Take(1_000_000)
                            .Batch(10_000)
    select b.Sum();

The same can be done with the new Batch as follows:

var q =
    MoreEnumerable.Generate(1L, x => x + 1)
                  .Take(1_000_000)
                  .Batch(10_000, ArrayPool<long>.Shared, Enumerable.Sum);

The first version will allocate 100 arrays whereas the second one should use a single array.


PS These Batch overloads are considered experimental.

@atifaziz atifaziz self-assigned this Oct 29, 2022
@atifaziz
Copy link
Member Author

Here are some initial benchmarks:

Method Size Mean Error StdDev Gen0 Gen1 Gen2 Allocated
BatchRegular 1000 15.44 ms 0.282 ms 0.357 ms 1906.2500 - - 7867.58 KB
BatchPooled 1000 18.28 ms 0.349 ms 0.343 ms - - - 47.46 KB
BatchRegular 10000 15.66 ms 0.300 ms 0.345 ms 1875.0000 31.2500 - 7818.36 KB
BatchPooled 10000 19.01 ms 0.366 ms 0.406 ms - - - 5.27 KB
BatchRegular 100000 17.22 ms 0.285 ms 0.267 ms 2468.7500 2468.7500 2468.7500 7814.25 KB
BatchPooled 100000 18.73 ms 0.286 ms 0.239 ms - - - 1.05 KB

The benchmark code was as follows:

using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;
using MoreLinq;
using MoreLinq.Experimental;

_ = BenchmarkRunner.Run<Benchmarks>();

[SimpleJob(RuntimeMoniker.Net60)]
[MemoryDiagnoser]
public class Benchmarks
{
    [Params(1_000, 10_000, 100_000)]
    public int Size { get; set; }

    [Benchmark]
    public void BatchRegular()
    {
        MoreEnumerable
            .Generate(1L, x => x + 1)
            .Take(1_000_000)
            .Batch(Size)
            .Select(Enumerable.Sum)
            .Consume();
    }

    [Benchmark]
    public void BatchPooled()
    {
        MoreEnumerable
            .Generate(1L, x => x + 1)
            .Take(1_000_000)
            .Batch(Size, ArrayPool<long>.Shared, Enumerable.Sum)
            .Consume();
    }
}

@codecov
Copy link

codecov bot commented Oct 29, 2022

Codecov Report

Merging #856 (aa05519) into master (ee8abeb) will not change coverage.
The diff coverage is n/a.

❗ Current head aa05519 differs from pull request most recent head ec82d1b. Consider uploading reports for the commit ec82d1b to get more accurate results

@@           Coverage Diff           @@
##           master     #856   +/-   ##
=======================================
  Coverage   92.45%   92.45%           
=======================================
  Files         108      108           
  Lines        3434     3434           
  Branches     1025     1025           
=======================================
  Hits         3175     3175           
  Misses        197      197           
  Partials       62       62           

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

MoreLinq/Experimental/Batch.cs Outdated Show resolved Hide resolved
MoreLinq/Experimental/CurrentList.cs Outdated Show resolved Hide resolved
MoreLinq/Experimental/Batch.cs Outdated Show resolved Hide resolved
These changes belonged with:

> commit 61dbc84
> Author: Atif Aziz <code@raboof.com>
> Date:   Sat Oct 29 19:11:45 2022 +0200
>
>     Remove "AsSpan" from current list
MoreLinq/Experimental/Batch.cs Outdated Show resolved Hide resolved
MoreLinq/Experimental/Batch.cs Outdated Show resolved Hide resolved
MoreLinq/Experimental/Batch.cs Outdated Show resolved Hide resolved
atifaziz and others added 4 commits November 12, 2022 12:45
Co-authored-by: Stuart Turner <stuart@turner-isageek.com>
Co-authored-by: Stuart Turner <stuart@turner-isageek.com>
Co-authored-by: Stuart Turner <stuart@turner-isageek.com>
@atifaziz atifaziz merged commit ba487dd into morelinq:master Nov 13, 2022
@atifaziz atifaziz deleted the batch-pool branch November 13, 2022 17:55
@atifaziz atifaziz mentioned this pull request Jan 15, 2023
@atifaziz atifaziz added this to the 3.4.0 milestone Jun 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants