Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ASCII fast path for ILIKE scalar (90% faster) #3306

Merged
merged 8 commits into from
Dec 9, 2022

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Dec 9, 2022

Which issue does this PR close?

Closes #3311

Rationale for this change

ilike_utf8 scalar equals
                        time:   [218.99 µs 219.05 µs 219.10 µs]
                        change: [-90.262% -90.257% -90.253%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe

Benchmarking ilike_utf8 scalar contains: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.1s, enable flat sampling, or reduce sample count to 50.
ilike_utf8 scalar contains
                        time:   [1.8097 ms 1.8104 ms 1.8111 ms]
                        change: [-56.369% -56.326% -56.264%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

ilike_utf8 scalar ends with
                        time:   [261.52 µs 261.58 µs 261.65 µs]
                        change: [-88.863% -88.850% -88.831%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

ilike_utf8 scalar starts with
                        time:   [265.40 µs 265.45 µs 265.50 µs]
                        change: [-88.692% -88.679% -88.656%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low mild
  6 (6.00%) high mild
  3 (3.00%) high severe

Benchmarking ilike_utf8 scalar complex: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.2s, enable flat sampling, or reduce sample count to 50.
ilike_utf8 scalar complex
                        time:   [1.8319 ms 1.8327 ms 1.8334 ms]
                        change: [-4.2733% -4.1030% -3.9344%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

nilike_utf8 scalar equals
                        time:   [232.53 µs 232.59 µs 232.65 µs]
                        change: [-90.314% -90.298% -90.265%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

Benchmarking nilike_utf8 scalar contains: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.1s, enable flat sampling, or reduce sample count to 50.
nilike_utf8 scalar contains
                        time:   [1.7942 ms 1.7962 ms 1.7993 ms]
                        change: [-57.816% -57.765% -57.701%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

nilike_utf8 scalar ends with
                        time:   [243.52 µs 243.58 µs 243.66 µs]
                        change: [-89.761% -89.758% -89.754%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  1 (1.00%) high severe

nilike_utf8 scalar starts with
                        time:   [253.10 µs 253.18 µs 253.28 µs]
                        change: [-89.367% -89.347% -89.313%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe

Benchmarking nilike_utf8 scalar complex: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.2s, enable flat sampling, or reduce sample count to 50.
nilike_utf8 scalar complex
                        time:   [1.8143 ms 1.8152 ms 1.8162 ms]
                        change: [-5.6406% -5.5148% -5.3303%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Dec 9, 2022
@tustvold tustvold force-pushed the ilike-fast-path branch 2 times, most recently from 5d73cc4 to 160ee74 Compare December 9, 2022 12:18
@tustvold
Copy link
Contributor Author

tustvold commented Dec 9, 2022

FYI @askoa

@tustvold tustvold requested a review from viirya December 9, 2022 12:24
} else if right.starts_with('%') && !right[1..].contains(is_like_pattern) {
// fast path, can use ends_with
let ends_str = &right[1..].to_uppercase();
// If not ASCII faster to use case insensitive regex than allocating using to_uppercase
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If not ASCII faster to use case insensitive regex than allocating using to_uppercase
// If ASCII faster to use case insensitive regex than allocating using to_uppercase

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the existing tests cover all the corner cases?

@tustvold
Copy link
Contributor Author

tustvold commented Dec 9, 2022

I can add some more, give me a minute

@tustvold
Copy link
Contributor Author

tustvold commented Dec 9, 2022

I think this might not work for non-ASCII characters that capitalize to ASCII characters 😢

@tustvold tustvold marked this pull request as draft December 9, 2022 15:06
@alamb
Copy link
Contributor

alamb commented Dec 9, 2022

Perhaps you can also check that left is entirely ASCII (which is a fairly common case)

@tustvold
Copy link
Contributor Author

tustvold commented Dec 9, 2022

Interestingly the regex crate doesn't appear to handle this correctly... 🤔

@tustvold
Copy link
Contributor Author

tustvold commented Dec 9, 2022

I think this might not work for non-ASCII characters that capitalize to ASCII characters

So it turns out the fact we were handling this was actually incorrect, so this is not only faster, but more correct 😆 - #3311

@tustvold tustvold marked this pull request as ready for review December 9, 2022 16:01
@tustvold
Copy link
Contributor Author

tustvold commented Dec 9, 2022

Aargh, there are unicode characters that have lowercase ASCII... Guess I will need to check if the string are ASCII, and use the regex if not

@tustvold tustvold marked this pull request as draft December 9, 2022 16:21
@tustvold
Copy link
Contributor Author

tustvold commented Dec 9, 2022

This is currently blocked on #3313, which will allow the kernel to assume a GenericStringArray, which in turn will allow ASCII verification in one pass.

@tustvold
Copy link
Contributor Author

tustvold commented Dec 9, 2022

ilike_utf8 scalar equals
                        time:   [271.37 µs 271.48 µs 271.62 µs]
                        change: [-87.932% -87.924% -87.914%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  7 (7.00%) high severe

Benchmarking ilike_utf8 scalar contains: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.1s, enable flat sampling, or reduce sample count to 50.
ilike_utf8 scalar contains
                        time:   [1.8028 ms 1.8037 ms 1.8048 ms]
                        change: [-56.517% -56.488% -56.455%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe

ilike_utf8 scalar ends with
                        time:   [287.38 µs 287.52 µs 287.65 µs]
                        change: [-87.759% -87.745% -87.729%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high severe

ilike_utf8 scalar starts with
                        time:   [273.73 µs 273.85 µs 273.98 µs]
                        change: [-88.337% -88.328% -88.318%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  3 (3.00%) high severe

Benchmarking ilike_utf8 scalar complex: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.3s, enable flat sampling, or reduce sample count to 50.
ilike_utf8 scalar complex
                        time:   [1.8338 ms 1.8349 ms 1.8361 ms]
                        change: [-4.1219% -3.9358% -3.7411%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

nilike_utf8 scalar equals
                        time:   [257.80 µs 257.85 µs 257.92 µs]
                        change: [-89.265% -89.246% -89.211%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

Benchmarking nilike_utf8 scalar contains: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.2s, enable flat sampling, or reduce sample count to 50.
nilike_utf8 scalar contains
                        time:   [1.8283 ms 1.8293 ms 1.8303 ms]
                        change: [-57.048% -56.996% -56.926%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

nilike_utf8 scalar ends with
                        time:   [256.38 µs 256.46 µs 256.54 µs]
                        change: [-89.212% -89.192% -89.160%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

nilike_utf8 scalar starts with
                        time:   [262.65 µs 262.70 µs 262.77 µs]
                        change: [-88.974% -88.958% -88.927%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

Benchmarking nilike_utf8 scalar complex: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.3s, enable flat sampling, or reduce sample count to 50.
nilike_utf8 scalar complex
                        time:   [1.8537 ms 1.8547 ms 1.8559 ms]
                        change: [-3.5514% -3.4042% -3.2212%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  8 (8.00%) high severe

ilike_utf8_scalar_dyn dictionary[10] string[4])
                        time:   [59.817 µs 59.887 µs 59.968 µs]
                        change: [-97.558% -97.553% -97.546%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe

So even with the extra check to detect is_ascii, this still is a pretty significant speedup

@tustvold tustvold marked this pull request as ready for review December 9, 2022 20:34
@@ -1272,6 +1270,101 @@ mod tests {
vec![true, false, false, false]
);

// We only implement loose matching
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I ran these tests without the changes in this PR I got a bunch of failures. Is that expected?

failures:

---- like::tests::test_utf8_array_ilike_unicode stdout ----
thread 'like::tests::test_utf8_array_ilike_unicode' panicked at 'assertion failed: `(left == right)`
  left: `true`,
 right: `false`: unexpected result when comparing FFkoß at position 0 to FFkoSS ', arrow-string/src/like.rs:1279:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:142:14
   2: core::panicking::assert_failed_inner
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:218:23
   3: core::panicking::assert_failed
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:181:5
   4: arrow_string::like::tests::test_utf8_array_ilike_unicode
             at ./src/like.rs:1279:5
   5: arrow_string::like::tests::test_utf8_array_ilike_unicode::{{closure}}
             at ./src/like.rs:917:13
   6: core::ops::function::FnOnce::call_once
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
   7: core::ops::function::FnOnce::call_once
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

---- like::tests::test_utf8_array_ilike_unicode_contains stdout ----
thread 'like::tests::test_utf8_array_ilike_unicode_contains' panicked at 'assertion failed: `(left == right)`
  left: `true`,
 right: `false`: unexpected result when comparing sdlkdfFkoßsdfs at position 0 to %FFkoSS% ', arrow-string/src/like.rs:1331:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:142:14
   2: core::panicking::assert_failed_inner
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:218:23
   3: core::panicking::assert_failed
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:181:5
   4: arrow_string::like::tests::test_utf8_array_ilike_unicode_contains
             at ./src/like.rs:1331:5
   5: arrow_string::like::tests::test_utf8_array_ilike_unicode_contains::{{closure}}
             at ./src/like.rs:917:13
   6: core::ops::function::FnOnce::call_once
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
   7: core::ops::function::FnOnce::call_once
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

---- like::tests::test_utf8_array_ilike_unicode_contains_dyn stdout ----
thread 'like::tests::test_utf8_array_ilike_unicode_contains_dyn' panicked at 'assertion failed: `(left == right)`
  left: `true`,
 right: `false`: unexpected result when comparing sdlkdfFkoßsdfs at position 0 to %FFkoSS% ', arrow-string/src/like.rs:1331:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:142:14
   2: core::panicking::assert_failed_inner
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:218:23
   3: core::panicking::assert_failed
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:181:5
   4: arrow_string::like::tests::test_utf8_array_ilike_unicode_contains_dyn
             at ./src/like.rs:1331:5
   5: arrow_string::like::tests::test_utf8_array_ilike_unicode_contains_dyn::{{closure}}
             at ./src/like.rs:917:13
   6: core::ops::function::FnOnce::call_once
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
   7: core::ops::function::FnOnce::call_once
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

---- like::tests::test_utf8_array_ilike_unicode_dyn stdout ----
thread 'like::tests::test_utf8_array_ilike_unicode_dyn' panicked at 'assertion failed: `(left == right)`
  left: `true`,
 right: `false`: unexpected result when comparing FFkoß at position 0 to FFkoSS ', arrow-string/src/like.rs:1279:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:142:14
   2: core::panicking::assert_failed_inner
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:218:23
   3: core::panicking::assert_failed
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:181:5
   4: arrow_string::like::tests::test_utf8_array_ilike_unicode_dyn
             at ./src/like.rs:1279:5
   5: arrow_string::like::tests::test_utf8_array_ilike_unicode_dyn::{{closure}}
             at ./src/like.rs:917:13
   6: core::ops::function::FnOnce::call_once
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
   7: core::ops::function::FnOnce::call_once
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

---- like::tests::test_utf8_array_ilike_unicode_ends stdout ----
thread 'like::tests::test_utf8_array_ilike_unicode_ends' panicked at 'assertion failed: `(left == right)`
  left: `true`,
 right: `false`: unexpected result when comparing sdlkdfFFkoß at position 0 to %FFkoSS ', arrow-string/src/like.rs:1311:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:142:14
   2: core::panicking::assert_failed_inner
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:218:23
   3: core::panicking::assert_failed
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:181:5
   4: arrow_string::like::tests::test_utf8_array_ilike_unicode_ends
             at ./src/like.rs:924:21
   5: arrow_string::like::tests::test_utf8_array_ilike_unicode_ends::{{closure}}
             at ./src/like.rs:917:13
   6: core::ops::function::FnOnce::call_once
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
   7: core::ops::function::FnOnce::call_once
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

---- like::tests::test_utf8_array_ilike_unicode_ends_dyn stdout ----
thread 'like::tests::test_utf8_array_ilike_unicode_ends_dyn' panicked at 'assertion failed: `(left == right)`
  left: `true`,
 right: `false`: unexpected result when comparing sdlkdfFFkoß at position 0 to %FFkoSS ', arrow-string/src/like.rs:1311:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:142:14
   2: core::panicking::assert_failed_inner
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:218:23
   3: core::panicking::assert_failed
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:181:5
   4: arrow_string::like::tests::test_utf8_array_ilike_unicode_ends_dyn
             at ./src/like.rs:924:21
   5: arrow_string::like::tests::test_utf8_array_ilike_unicode_ends_dyn::{{closure}}
             at ./src/like.rs:917:13
   6: core::ops::function::FnOnce::call_once
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
   7: core::ops::function::FnOnce::call_once
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

---- like::tests::test_utf8_array_ilike_unicode_starts stdout ----
thread 'like::tests::test_utf8_array_ilike_unicode_starts' panicked at 'assertion failed: `(left == right)`
  left: `true`,
 right: `false`: unexpected result when comparing FFkoßsdlkdf at position 0 to FFkoSS% ', arrow-string/src/like.rs:1291:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:142:14
   2: core::panicking::assert_failed_inner
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:218:23
   3: core::panicking::assert_failed
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:181:5
   4: arrow_string::like::tests::test_utf8_array_ilike_unicode_starts
             at ./src/like.rs:1291:5
   5: arrow_string::like::tests::test_utf8_array_ilike_unicode_starts::{{closure}}
             at ./src/like.rs:917:13
   6: core::ops::function::FnOnce::call_once
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
   7: core::ops::function::FnOnce::call_once
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

---- like::tests::test_utf8_array_ilike_unicode_start_dyn stdout ----
thread 'like::tests::test_utf8_array_ilike_unicode_start_dyn' panicked at 'assertion failed: `(left == right)`
  left: `true`,
 right: `false`: unexpected result when comparing FFkoßsdlkdf at position 0 to FFkoSS% ', arrow-string/src/like.rs:1291:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:142:14
   2: core::panicking::assert_failed_inner
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:218:23
   3: core::panicking::assert_failed
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:181:5
   4: arrow_string::like::tests::test_utf8_array_ilike_unicode_start_dyn
             at ./src/like.rs:1291:5
   5: arrow_string::like::tests::test_utf8_array_ilike_unicode_start_dyn::{{closure}}
             at ./src/like.rs:917:13
   6: core::ops::function::FnOnce::call_once
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
   7: core::ops::function::FnOnce::call_once
             at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:248:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.


failures:
    like::tests::test_utf8_array_ilike_unicode
    like::tests::test_utf8_array_ilike_unicode_contains
    like::tests::test_utf8_array_ilike_unicode_contains_dyn
    like::tests::test_utf8_array_ilike_unicode_dyn
    like::tests::test_utf8_array_ilike_unicode_ends
    like::tests::test_utf8_array_ilike_unicode_ends_dyn
    like::tests::test_utf8_array_ilike_unicode_start_dyn
    like::tests::test_utf8_array_ilike_unicode_starts

Copy link
Contributor Author

@tustvold tustvold Dec 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes see #3311 the previous special case logic was incorrect as it changed the semantics from the regex

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM -- cc @Dandandan

@tustvold tustvold merged commit 9e39f96 into apache:master Dec 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ILIKE Kernels Inconsistent Case Folding
2 participants