Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cleanup array_has #12460

Merged
merged 2 commits into from
Sep 16, 2024
Merged

cleanup array_has #12460

merged 2 commits into from
Sep 16, 2024

Conversation

samuelcolvin
Copy link
Contributor

@samuelcolvin samuelcolvin commented Sep 13, 2024

Rationale for this change

While working on #12459 I noticed a few things that could do with being cleaned up.

What changes are included in this PR?

  • Rewrite ArrayHas::invoke to be cleaner and easier to understand — logic hasn't changed
  • Rewrite array_has_dispatch_for_scalar to use BooleanArray builder, not a Vec Small tweak to array_has_dispatch_for_scalar

Are these changes tested?

Tests for array_has already exist.

Are there any user-facing changes?

no.

@@ -203,24 +192,26 @@ fn array_has_dispatch_for_scalar<O: OffsetSizeTrait>(
return Ok(Arc::new(BooleanArray::from(vec![Some(false)])));
}
let eq_array = compare_with_eq(values, needle, is_nested)?;
let mut final_contained = vec![None; haystack.len()];
for (i, offset) in offsets.windows(2).enumerate() {
let mut final_contained = BooleanArray::builder(haystack.len());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is using builder faster then From<Vec>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out what you had was faster, appologies, I had assumed boolean builder existed because it was significantly faster. Reverting.

BooleanArray_builder    time:   [18.503 µs 18.518 µs 18.532 µs]
                        change: [-0.1321% +0.0661% +0.2517%] (p = 0.51 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  7 (7.00%) high mild

BooleanArray_vec        time:   [12.353 µs 12.496 µs 12.653 µs]
                        change: [-26.653% -26.272% -25.814%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  13 (13.00%) high severe
extern crate criterion;

use arrow::array::BooleanArray;
use criterion::{black_box, criterion_group, criterion_main, Criterion};

fn criterion_benchmark(c: &mut Criterion) {
    c.bench_function("BooleanArray_builder", |b| {
        b.iter(|| {
            let mut final_contained = BooleanArray::builder(8192);
            for i in 0..8192 {
                if i % 3 == 0 {
                    final_contained.append_value(true);
                } else if i % 2 == 0 {
                    final_contained.append_value(false);
                } else {
                    final_contained.append_null();
                }
            }
            black_box(final_contained.finish());
        })
    });

    c.bench_function("BooleanArray_vec", |b| {
        b.iter(|| {
            let mut vec = vec![None; 8192];
            for i in 0..8192 {
                if i % 3 == 0 {
                    vec[i] = Some(true);
                } else if i % 2 == 0 {
                    vec[i] = Some(false);
                }
            }
            black_box(BooleanArray::from(vec));
        })
    });
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb
Copy link
Contributor

alamb commented Sep 16, 2024

🚀

@alamb alamb merged commit 25c34f9 into apache:main Sep 16, 2024
25 checks passed
@alamb
Copy link
Contributor

alamb commented Sep 16, 2024

Thanks @samuelcolvin and @jayzhan211

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants