Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change pretty_format_batches to return Result<impl Display> #975

Merged
merged 5 commits into from
Dec 1, 2021

Conversation

matthewmturner
Copy link
Contributor

Which issue does this PR close?

Closes #951

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Nov 24, 2021
@matthewmturner
Copy link
Contributor Author

@jimexist FYI ive started the work on this if you wan to check it out.

@codecov-commenter
Copy link

codecov-commenter commented Nov 24, 2021

Codecov Report

Merging #975 (a696026) into master (f6908bf) will decrease coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #975      +/-   ##
==========================================
- Coverage   82.31%   82.31%   -0.01%     
==========================================
  Files         168      168              
  Lines       48761    48763       +2     
==========================================
+ Hits        40136    40137       +1     
- Misses       8625     8626       +1     
Impacted Files Coverage Δ
arrow/src/datatypes/field.rs 53.37% <0.00%> (-0.31%) ⬇️
parquet/src/basic.rs 94.38% <0.00%> (-0.23%) ⬇️
parquet/src/encodings/encoding.rs 93.52% <0.00%> (-0.20%) ⬇️
parquet/src/file/writer.rs 93.30% <0.00%> (ø)
arrow/src/buffer/mutable.rs 89.47% <0.00%> (ø)
parquet/src/file/metadata.rs 92.96% <0.00%> (ø)
arrow/src/array/transform/mod.rs 85.24% <0.00%> (+0.27%) ⬆️
arrow/src/datatypes/datatype.rs 66.38% <0.00%> (+0.42%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f6908bf...a696026. Read the comment docs.

@@ -42,6 +43,12 @@ pub fn print_batches(results: &[RecordBatch]) -> Result<()> {
Ok(())
}

pub fn write_batches<W: Write>(buf: &mut W, results: &[RecordBatch]) -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what you both would think about implementing Display instead of a new free function?

Something like

fn displayable(batches: &[RecordBatch]) -> impl Display {
...
}

Then we could use that directly in format calls, like

println!("Batches: {}", displayable(batches));

As well as writing:

write!(&mut w, "{}", displayable(batches));

Basically it feels to me like if we are going to add something new to pretty print it would be nice to make it as flexible as possible.

You can do this displayable trick with a newtype wrapper, for example https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/logical_plan/display.rs#L96-L119

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb thanks for comment - that actually got me thinking about this differently.

first, couldnt we just do:

write!(&mut buf, "{}", pretty_format_batches(batches))

if so, im not even sure this PR is needed?

second, assuming that doesnt work could you just explain what the difference is between pretty_format_batches and the displayable function you proposed?

@jimexist what do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an excellent point @matthewmturner

second, assuming that doesnt work could you just explain what the difference is between pretty_format_batches and the displayable function you proposed?

The only difference I am aware of is that pretty_format_batches requires a String (so allocates some memory and puts the formatted batches there). Thus it is not as efficient

Though now that you mention this, perhaps we could change pretty_format_batches to something like the following (basically to get rid of the to_string():

///! Create a visual representation of record batches
pub fn pretty_format_batches(results: &[RecordBatch]) -> Result<impl Display> {
    create_table(results)
}
///! Create a visual representation of columns
pub fn pretty_format_columns(col_name: &str, results: &[ArrayRef]) -> Result<impl Display> {
    create_column(col_name, results)
}

This would be a API change because callers would now have to call to_string() if they wanted a string, so instead of

let s  = pretty_format_batches(&batches)?;

It would look more like

let s  = pretty_format_batches(&batches)?.to_string();

But the benefit is that the String would no longer be created unless it was necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx @alamb, I think I agree and it makes sense.

And just to confirm, is the reason to return a type that implements Display instead of the Table or Column from comfy_table to give us more flexibility around that in the future?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And just to confirm, is the reason to return a type that implements Display instead of the Table or Column from comfy_table to give us more flexibility around that in the future?

Yes, I think so -- for example when we updated from pretty-table to comfy-table or upgraded versions of comfy-table the arrow API wouldn't change

@matthewmturner
Copy link
Contributor Author

matthewmturner commented Nov 29, 2021

im a little confused why a test is failing in CI.

locally i get an unused warning on std::fmt::Write but if i remove it i get error when using write! macro.

Ill need to look into this more.

@alamb
Copy link
Contributor

alamb commented Nov 29, 2021

Hi @matthewmturner

locally i get an unused warning on std::fmt::Write but if i remove it i get error when using write! macro.

I think that is due to the fact that the Write is only used in tests.

So if you change the code with something like

diff --git a/arrow/src/util/pretty.rs b/arrow/src/util/pretty.rs
index 144eb6b601..e0f3574ed2 100644
--- a/arrow/src/util/pretty.rs
+++ b/arrow/src/util/pretty.rs
@@ -19,7 +19,7 @@
 //! available unless `feature = "prettyprint"` is enabled.
 
 use crate::{array::ArrayRef, record_batch::RecordBatch};
-use std::fmt::{Display, Write};
+use std::fmt::Display;
 
 use comfy_table::{Cell, Table};
 
@@ -120,6 +120,7 @@ mod tests {
     use super::*;
     use crate::array::{DecimalBuilder, Int32Array};
     use std::sync::Arc;
+    use std::fmt::Write;
 
     #[test]
     fn test_pretty_format_batches() -> Result<()> {

I pushed a fix to your branch and resolved a conflict (I caused)

@matthewmturner
Copy link
Contributor Author

@alamb ugh sry should have picked that up. thank you.

@alamb
Copy link
Contributor

alamb commented Nov 29, 2021

@alamb ugh sry should have picked that up. thank you.

No problem -- that hit me many times when I was newer to rust and I found it very confusing

@alamb
Copy link
Contributor

alamb commented Nov 29, 2021

@matthewmturner -- interestingly the error reported in the tests only appeared for me when this branch got merged with master (there was a test with a logical conflict that was added for fixed size list formatting). Fixed (hopefully) in a696026

@alamb alamb changed the title Add function to write batches to buffer Change pretty_format_batches to return Result<impl Display> Nov 30, 2021
@alamb alamb added the api-change Changes to the arrow API label Nov 30, 2021
@alamb
Copy link
Contributor

alamb commented Nov 30, 2021

Any final thoughts @jimexist ?

@jimexist
Copy link
Member

jimexist commented Dec 1, 2021

thanks for the change - it looks much more flexible now

@jimexist jimexist merged commit e9be49d into apache:master Dec 1, 2021
@liukun4515
Copy link
Contributor

@alamb @jimexist will these changes be applied to the release version?

@alamb
Copy link
Contributor

alamb commented Dec 17, 2021

@alamb @jimexist will these changes be applied to the release version?

Since it is a breaking API change, the plan will be to release these changes in arrow 7.0.0 (due in min January 2022)

@liukun4515
Copy link
Contributor

Thanks for your reply.
I found the API broken issue when I change some codes in arrow-rs and apply my codes to datafusion.

@alamb
Copy link
Contributor

alamb commented Dec 18, 2021

I found the API broken issue when I change some codes in arrow-rs and apply my codes to datafusion.

DataFusion is effectively using the code from the active_release branch of arrow-rs rather than master -- so if you want to try you code against datafusion you can start from active_release and apply your changes to arrow-rs there

@liukun4515
Copy link
Contributor

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

allow pretty::print_batches to take bytes sink instead of only stdout
5 participants