Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve arrow flight batch splitting and naming #3444

Merged
merged 3 commits into from
Jan 5, 2023

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jan 3, 2023

Which issue does this PR close?

re #3371

Rationale for this change

I wrote some tests for this code in https://github.com/influxdata/influxdb_iox/pull/6484 (actually covers a bug I fixed when upstreaming to arrow-rs)

Also, it would be nice to clarify that the units are in bytes and not some other unit (like rows, for example) as suggested by @carols10cents on https://github.com/influxdata/influxdb_iox/pull/6460

What changes are included in this PR?

  1. Add more tests for record batch splitting
  2. Change name of parameter from max_message_size to max_message_size_bytes

Are there any user-facing changes?

No : Note that this code has not been released yet, so there are no external API changes.

@github-actions github-actions bot added the arrow-flight Changes to the arrow-flight crate label Jan 3, 2023
@tustvold tustvold added the api-change Changes to the arrow API label Jan 4, 2023
arrow-flight/src/encode.rs Outdated Show resolved Hide resolved
max_batch_size: usize,
/// The maximum approximate target message size in bytes
/// (see details on [`Self::with_max_flight_data_size`]).
max_flight_data_size: usize,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed this to max_flight_data_size and tried to clarify the intent more

@alamb alamb requested a review from tustvold January 4, 2023 22:32
max_flight_data_size_bytes: usize,
expected_sizes: Vec<usize>,
) {
let array: UInt64Array = (0..num_input_rows).collect();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurs to me that this splitting won't work for dictionaries, not really sure what we can do about that though... 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I guess we hope for small dictionaries

@alamb alamb merged commit e256e3d into apache:master Jan 5, 2023
@ursabot
Copy link

ursabot commented Jan 5, 2023

Benchmark runs are scheduled for baseline = 4a3b7e9 and contender = e256e3d. e256e3d is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow-flight Changes to the arrow-flight crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants