Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support compression for IPC with revamped feature flags #2369

Merged
merged 38 commits into from
Aug 14, 2022

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Aug 8, 2022

Which issue does this PR close?

Closes #1855
Closes liukun4515#1
Closes #1709
Closes #70
Closes #2342

Rationale for this change

Not this is a new PR is based on @liukun4515 's work in #1855. Github is showing a massive diff in liukun4515#1 so I wanted to get this PR up so we could see what the actual change is proposed

The logic is unchanged, but I changed how the feature flagging works so that the code differences are isolated into the ipc_compression module

What changes are included in this PR?

  1. Add support for LZ4_STREAM and ZSTD under ipc_compression feature flag
  2. Tests for same

Are there any user-facing changes?

New feature flag, and support for compression

@alamb alamb changed the title Alamb/help feature flags support compression for IPC with revamped feature flags Aug 8, 2022
Copy link
Contributor

@liukun4515 liukun4515 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your effort and push this feature forward, and help me to refactor this model.

LGTM

@alamb
Copy link
Contributor Author

alamb commented Aug 11, 2022

Thanks for your effort and push this feature forward, and help me to refactor this model.

It was my pleasure -- thank you for figuring out the tests and the format that was required

@alamb alamb requested a review from tustvold August 11, 2022 10:19
@liukun4515
Copy link
Contributor

Thanks for your effort and push this feature forward, and help me to refactor this model.

It was my pleasure -- thank you for figuring out the tests and the format that was required

Also we need to enable the IT test in the arrow.
#1907
And I can try it.

I have not thought about it carefully, but how to enable the ipc_compression in the IT ci?

@liukun4515
Copy link
Contributor

This issue #2342 also can be closed.

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just some minor nits

arrow/src/ipc/compression/codec.rs Outdated Show resolved Hide resolved
} else if decompressed_length == LENGTH_NO_COMPRESSED_DATA {
// no compression
let data = &input[(LENGTH_OF_PREFIX_DATA as usize)..];
Buffer::from(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an observation, but this copy is kind of unfortunate (although existed before)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you thinking the alternate is to create a Buffer initially and write this code in terms of Buffer rather than &[u8]?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... Not important for this PR, but it seems unfortunate the amount of memory copying we are doing, especially when the major design goal of the IPC spec is to avoid this 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed #2437

}

impl CompressionCodec {
#[allow(clippy::ptr_arg)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be misunderstanding this lint, but I don't think it should be firing for this method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I remove it and run clippy like

cargo clippy -p arrow --features=prettyprint,csv,ipc,test_utils --all-targets -- -D warnings

Clippy tells me:

error: writing `&mut Vec` instead of `&mut [_]` involves a new object where a slice will do
  --> arrow/src/ipc/compression/stub.rs:50:18
   |
50 |         _output: &mut Vec<u8>,
   |                  ^^^^^^^^^^^^ help: change this to: `&mut [u8]`
   |
   = note: `-D clippy::ptr-arg` implied by `-D warnings`
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#ptr_arg

Which while true in this case, is not true for the actual codec implementation, and they both need to have the same signature.

Copy link
Contributor

@tustvold tustvold Aug 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very daft lint, &mut Vec<u8> is not the same as &mut [u8], in particular the latter must initialize memory... I can understand &[u8] vs &Vec<u8> but the mutability makes a difference... I somewhat assumed clippy wasn't being silly, guess I was wrong 😅

arrow/src/ipc/writer.rs Outdated Show resolved Hide resolved
}
// pad the tail of body data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this previously wasn't needed because the buffers were already 8 byte aligned

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably -- I am not sure to be honest. This is code that was in @liukun4515 's original PR but it seemed reasonable to me

@@ -520,17 +593,20 @@ impl<W: Write> FileWriter<W> {
let data_gen = IpcDataGenerator::default();
let mut writer = BufWriter::new(writer);
// write magic to header
let mut header_size: usize = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut header_size: usize = 0;
let header_size: usize = ARROW_MAGIC.len() + 2;

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call -- I have updated in 2ed7ce3 (and added an assert to verify the currently implicit assumption that ARROW_MAGIC is 6 bytes long.

arrow/Cargo.toml Outdated Show resolved Hide resolved
// under the License.

//! Stubs that implement the same interface as ipc_compression
//! but always error.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

arrow/src/ipc/reader.rs Outdated Show resolved Hide resolved
arrow/src/ipc/writer.rs Outdated Show resolved Hide resolved
@tustvold tustvold merged commit 5e27d93 into apache:master Aug 14, 2022
@ursabot
Copy link

ursabot commented Aug 14, 2022

Benchmark runs are scheduled for baseline = 1c879ae and contender = 5e27d93. 5e27d93 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
5 participants