Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix instances of UB that cause tests to not pass under miri #878

Merged
merged 4 commits into from
Nov 1, 2021
Merged

Fix instances of UB that cause tests to not pass under miri #878

merged 4 commits into from
Nov 1, 2021

Conversation

saethlin
Copy link
Contributor

Which issue does this PR close?

Closes #877.

What changes are included in this PR?

This fixes the UB in the parquet bit_util modules by explicitly doing unaligned reads. This also fixes the UB in murmur_hash2_64a by converting groups of u8 into u64 with a mechanism that does not require unsafe code.

Are there any user-facing changes?

There are no API changes. The performance impact is mixed. It's a bit hard to imagine how this makes anything faster, but comforting that not all changes are regressions and none are major.

cargo bench --all-features before/after

arrow_array_reader/read Int32Array, plain encoded, mandatory, no NULLs - old
time: [3.9834 us 3.9850 us 3.9867 us]
change: [-1.1683% -1.1126% -1.0585%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high mild
arrow_array_reader/read Int32Array, plain encoded, mandatory, no NULLs - new
time: [2.6234 us 2.6241 us 2.6249 us]
change: [-1.5826% -1.5200% -1.4451%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
2 (2.00%) low mild
1 (1.00%) high mild
2 (2.00%) high severe
arrow_array_reader/read Int32Array, plain encoded, optional, no NULLs - old
time: [82.701 us 83.250 us 83.935 us]
change: [+2.6263% +3.3494% +4.1849%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
arrow_array_reader/read Int32Array, plain encoded, optional, no NULLs - new
time: [21.341 us 21.364 us 21.406 us]
change: [+0.1475% +0.3773% +0.6109%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) high mild
4 (4.00%) high severe
arrow_array_reader/read Int32Array, plain encoded, optional, half NULLs - old
time: [198.46 us 198.62 us 198.79 us]
change: [-1.5860% -0.4918% +0.3755%] (p = 0.37 > 0.05)
No change in performance detected.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) low mild
2 (2.00%) high severe
arrow_array_reader/read Int32Array, plain encoded, optional, half NULLs - new
time: [198.12 us 198.21 us 198.35 us]
change: [-0.7897% +0.7633% +2.0061%] (p = 0.34 > 0.05)
No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
4 (4.00%) high severe
arrow_array_reader/read Int32Array, dictionary encoded, mandatory, no NULLs - old
time: [26.411 us 26.416 us 26.420 us]
change: [-2.8843% -2.8415% -2.8043%] (p = 0.00 < 0.05)
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) high mild
1 (1.00%) high severe
arrow_array_reader/read Int32Array, dictionary encoded, mandatory, no NULLs - new
time: [100.27 us 100.29 us 100.31 us]
change: [+0.0819% +0.1397% +0.1824%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
1 (1.00%) low mild
3 (3.00%) high mild
arrow_array_reader/read Int32Array, dictionary encoded, optional, no NULLs - old
time: [103.82 us 103.85 us 103.88 us]
change: [-0.0918% -0.0199% +0.0564%] (p = 0.60 > 0.05)
No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
3 (3.00%) high mild
4 (4.00%) high severe
arrow_array_reader/read Int32Array, dictionary encoded, optional, no NULLs - new
time: [118.97 us 118.99 us 119.00 us]
change: [-0.6533% -0.5571% -0.4697%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
4 (4.00%) high mild
2 (2.00%) high severe
arrow_array_reader/read Int32Array, dictionary encoded, optional, half NULLs - old
time: [209.75 us 209.93 us 210.14 us]
change: [-0.4593% -0.3419% -0.1866%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
2 (2.00%) high severe
arrow_array_reader/read Int32Array, dictionary encoded, optional, half NULLs - new
time: [248.36 us 248.41 us 248.46 us]
change: [-0.6660% -0.6152% -0.5610%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) low mild
2 (2.00%) high severe
Benchmarking arrow_array_reader/read StringArray, plain encoded, mandatory, no NULLs - old: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.4s, enable flat sampling, or reduce sample count to 60.
arrow_array_reader/read StringArray, plain encoded, mandatory, no NULLs - old
time: [1.0652 ms 1.0653 ms 1.0655 ms]
change: [-0.4593% -0.3777% -0.2848%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
1 (1.00%) low mild
5 (5.00%) high mild
1 (1.00%) high severe
arrow_array_reader/read StringArray, plain encoded, mandatory, no NULLs - new
time: [129.79 us 129.82 us 129.84 us]
change: [-2.1307% -1.8979% -1.6514%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high mild
Benchmarking arrow_array_reader/read StringArray, plain encoded, optional, no NULLs - old: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.8s, enable flat sampling, or reduce sample count to 60.
arrow_array_reader/read StringArray, plain encoded, optional, no NULLs - old
time: [1.1470 ms 1.1471 ms 1.1472 ms]
change: [-0.0405% +0.0185% +0.0813%] (p = 0.57 > 0.05)
No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
1 (1.00%) low severe
2 (2.00%) low mild
2 (2.00%) high mild
7 (7.00%) high severe
arrow_array_reader/read StringArray, plain encoded, optional, no NULLs - new
time: [150.03 us 150.36 us 150.84 us]
change: [+0.5729% +0.7597% +0.9920%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) high mild
9 (9.00%) high severe
Benchmarking arrow_array_reader/read StringArray, plain encoded, optional, half NULLs - old: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.1s, enable flat sampling, or reduce sample count to 70.
arrow_array_reader/read StringArray, plain encoded, optional, half NULLs - old
time: [1.0030 ms 1.0031 ms 1.0032 ms]
change: [-0.5887% -0.5500% -0.5089%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
3 (3.00%) low severe
2 (2.00%) low mild
2 (2.00%) high mild
2 (2.00%) high severe
arrow_array_reader/read StringArray, plain encoded, optional, half NULLs - new
time: [313.06 us 313.10 us 313.13 us]
change: [+0.6358% +0.6714% +0.7022%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) low severe
1 (1.00%) low mild
2 (2.00%) high mild
1 (1.00%) high severe
Benchmarking arrow_array_reader/read StringArray, dictionary encoded, mandatory, no NULLs - old: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.2s, enable flat sampling, or reduce sample count to 60.
arrow_array_reader/read StringArray, dictionary encoded, mandatory, no NULLs - old
time: [1.0317 ms 1.0319 ms 1.0320 ms]
change: [+1.1671% +1.2760% +1.3809%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
2 (2.00%) low mild
3 (3.00%) high mild
3 (3.00%) high severe
arrow_array_reader/read StringArray, dictionary encoded, mandatory, no NULLs - new
time: [143.50 us 143.52 us 143.56 us]
change: [-0.6198% -0.5818% -0.5399%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
1 (1.00%) low severe
5 (5.00%) high mild
5 (5.00%) high severe
Benchmarking arrow_array_reader/read StringArray, dictionary encoded, optional, no NULLs - old: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.6s, enable flat sampling, or reduce sample count to 60.
arrow_array_reader/read StringArray, dictionary encoded, optional, no NULLs - old
time: [1.1055 ms 1.1057 ms 1.1058 ms]
change: [+1.1655% +1.2473% +1.3045%] (p = 0.00 < 0.05)
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
1 (1.00%) low mild
2 (2.00%) high mild
2 (2.00%) high severe
arrow_array_reader/read StringArray, dictionary encoded, optional, no NULLs - new
time: [162.32 us 162.34 us 162.36 us]
change: [-0.6968% -0.6660% -0.6240%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
5 (5.00%) high mild
3 (3.00%) high severe
arrow_array_reader/read StringArray, dictionary encoded, optional, half NULLs - old
time: [972.11 us 972.42 us 972.68 us]
change: [+0.1393% +0.2384% +0.3296%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
2 (2.00%) low mild
1 (1.00%) high mild
arrow_array_reader/read StringArray, dictionary encoded, optional, half NULLs - new
time: [309.08 us 309.11 us 309.15 us]
change: [+1.2438% +1.3455% +1.4354%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
1 (1.00%) low mild
5 (5.00%) high mild
2 (2.00%) high severe

 Running unittests (/home/ben/arrow-rs/target/release/deps/arrow_writer-0035a5c9607ed91b)

WARNING: HTML report generation will become a non-default optional feature in Criterion.rs 0.4.0.
This feature is being moved to cargo-criterion (https://github.com/bheisler/cargo-criterion) and will be optional in a future version of Criterion.rs. To silence this warning, either switch to cargo-criterion or enable the 'html_reports' feature in your Cargo.toml.

Gnuplot not found, using plotters backend
Benchmarking write_batch primitive/4096 values primitive: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.0s, enable flat sampling, or reduce sample count to 50.
write_batch primitive/4096 values primitive
time: [1.5769 ms 1.5772 ms 1.5774 ms]
thrpt: [111.73 MiB/s 111.75 MiB/s 111.77 MiB/s]
change:
time: [+1.2878% +1.3426% +1.3959%] (p = 0.00 < 0.05)
thrpt: [-1.3767% -1.3249% -1.2715%]
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
3 (3.00%) high mild
2 (2.00%) high severe
Benchmarking write_batch primitive/4096 values primitive non-null: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.2s, enable flat sampling, or reduce sample count to 50.
write_batch primitive/4096 values primitive non-null
time: [1.4060 ms 1.4064 ms 1.4067 ms]
thrpt: [125.30 MiB/s 125.33 MiB/s 125.36 MiB/s]
change:
time: [-1.5148% -1.4121% -1.3052%] (p = 0.00 < 0.05)
thrpt: [+1.3224% +1.4323% +1.5381%]
Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
6 (6.00%) high mild
4 (4.00%) high severe
write_batch primitive/4096 values bool
time: [94.494 us 94.521 us 94.551 us]
thrpt: [11.781 MiB/s 11.785 MiB/s 11.788 MiB/s]
change:
time: [-1.5791% -1.4230% -1.2750%] (p = 0.00 < 0.05)
thrpt: [+1.2914% +1.4435% +1.6044%]
Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
1 (1.00%) low severe
1 (1.00%) low mild
3 (3.00%) high mild
12 (12.00%) high severe
write_batch primitive/4096 values bool non-null
time: [64.259 us 64.289 us 64.324 us]
thrpt: [17.317 MiB/s 17.326 MiB/s 17.334 MiB/s]
change:
time: [-4.3881% -4.0095% -3.5526%] (p = 0.00 < 0.05)
thrpt: [+3.6834% +4.1770% +4.5895%]
Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
5 (5.00%) high mild
9 (9.00%) high severe
write_batch primitive/4096 values string
time: [787.41 us 788.07 us 788.64 us]
thrpt: [100.86 MiB/s 100.93 MiB/s 101.02 MiB/s]
change:
time: [-0.2931% +0.4413% +1.4753%] (p = 0.50 > 0.05)
thrpt: [-1.4539% -0.4394% +0.2939%]
No change in performance detected.
Found 17 outliers among 100 measurements (17.00%)
6 (6.00%) low severe
2 (2.00%) low mild
5 (5.00%) high mild
4 (4.00%) high severe
write_batch primitive/4096 values string non-null
time: [822.71 us 823.52 us 824.30 us]
thrpt: [96.499 MiB/s 96.590 MiB/s 96.686 MiB/s]
change:
time: [-0.4183% -0.2902% -0.1596%] (p = 0.00 < 0.05)
thrpt: [+0.1598% +0.2911% +0.4200%]
Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) high severe

Benchmarking write_batch nested/4096 values primitive list: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.6s, enable flat sampling, or reduce sample count to 50.
write_batch nested/4096 values primitive list
time: [1.6931 ms 1.6933 ms 1.6936 ms]
thrpt: [96.624 MiB/s 96.640 MiB/s 96.655 MiB/s]
change:
time: [-0.9455% -0.8252% -0.7078%] (p = 0.00 < 0.05)
thrpt: [+0.7129% +0.8321% +0.9546%]
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
1 (1.00%) low mild
2 (2.00%) high severe
write_batch nested/4096 values primitive list non-null
time: [2.0181 ms 2.0191 ms 2.0202 ms]
thrpt: [95.746 MiB/s 95.800 MiB/s 95.849 MiB/s]
change:
time: [-0.6942% -0.6250% -0.5530%] (p = 0.00 < 0.05)
thrpt: [+0.5561% +0.6289% +0.6991%]
Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) low mild
3 (3.00%) high mild
2 (2.00%) high severe

@github-actions github-actions bot added the parquet Changes to the parquet crate label Oct 29, 2021
@saethlin
Copy link
Contributor Author

This should also close #614

Also I have no idea what's going on with the miri run in CI. I've never seen that error before 😔

@alamb
Copy link
Contributor

alamb commented Oct 29, 2021

Thanks @saethlin -- I'll check this out carefully in the next day or two.

@alamb alamb requested review from sunchao and nevi-me October 29, 2021 11:01
@alamb
Copy link
Contributor

alamb commented Oct 29, 2021

Filed #879 to track unrelated MIRI failure -- we'll get it cleaned up

@jhorstmann
Copy link
Contributor

Changes here look good to me 🚀

@alamb
Copy link
Contributor

alamb commented Oct 31, 2021

I merged from the latest apache/master to this branch to get the fix for #879 -- I hope the CI will now pass

@codecov-commenter
Copy link

Codecov Report

Merging #878 (f2060dc) into master (2310936) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #878   +/-   ##
=======================================
  Coverage   82.45%   82.45%           
=======================================
  Files         168      168           
  Lines       48231    48229    -2     
=======================================
  Hits        39767    39767           
+ Misses       8464     8462    -2     
Impacted Files Coverage Δ
parquet/src/util/bit_packing.rs 99.96% <ø> (ø)
parquet/src/util/bit_util.rs 93.18% <ø> (ø)
parquet/src/util/hash_util.rs 95.65% <100.00%> (-0.13%) ⬇️
parquet_derive/src/parquet_field.rs 67.04% <0.00%> (ø)
parquet/src/encodings/encoding.rs 94.67% <0.00%> (+0.19%) ⬆️
arrow/src/datatypes/datatype.rs 65.80% <0.00%> (+0.43%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2310936...f2060dc. Read the comment docs.

@alamb alamb merged commit 4666b5f into apache:master Nov 1, 2021
@alamb
Copy link
Contributor

alamb commented Nov 1, 2021

Thanks again @saethlin !

@alamb alamb mentioned this pull request Nov 1, 2021
alamb added a commit that referenced this pull request Nov 4, 2021
* Fix unaligned access in bit-packing

* Fix creation of unaligned reference in murmur_hash2_64a

* Remove now-unnecessary unsafe

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
alamb added a commit that referenced this pull request Nov 5, 2021
)

* Fix unaligned access in bit-packing

* Fix creation of unaligned reference in murmur_hash2_64a

* Remove now-unnecessary unsafe

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Ben Kimock <kimockb@gmail.com>
ovr pushed a commit to cube-js/arrow-rs that referenced this pull request Mar 29, 2024
* Fix unaligned access in bit-packing

* Fix creation of unaligned reference in murmur_hash2_64a

* Remove now-unnecessary unsafe

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
ovr pushed a commit to cube-js/arrow-rs that referenced this pull request Apr 2, 2024
* Fix unaligned access in bit-packing

* Fix creation of unaligned reference in murmur_hash2_64a

* Remove now-unnecessary unsafe

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
ovr pushed a commit to cube-js/arrow-rs that referenced this pull request Apr 2, 2024
* Fix unaligned access in bit-packing

* Fix creation of unaligned reference in murmur_hash2_64a

* Remove now-unnecessary unsafe

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tests do not pass miri due to alignment issues
4 participants