Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Remove Accumulator::update and Accumulator::merge #1549

Closed
alamb opened this issue Jan 11, 2022 · 4 comments · Fixed by #1582
Closed

Proposal: Remove Accumulator::update and Accumulator::merge #1549

alamb opened this issue Jan 11, 2022 · 4 comments · Fixed by #1582
Labels
datafusion Changes in the datafusion crate enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Jan 11, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

It is very non obvious that either one or the other of Accumulator::update or Accumulator::update_batch are called. This results in unfortunate things like #1525 (implementation in terms of ScalarValue) only then to be followed by #1546 (implementation in terms of Array)

In addition, update_batch and merge_batch will always be the most performant, so they should always be what is implemented.

The same issue applies to Accumulator::merge and Accumulator::merge_batch

I tried making this clearer in the docs via #1542

Describe the solution you'd like
I think a clearer thing to do would be to remove update and merge entirely from the traits and ensure the examples are clear to keep the barrier to entry as low as possible.

Additional context
See additional context on #1547 (review)

@liukun4515
Copy link
Contributor

liukun4515 commented Jan 12, 2022

Can we change the default implementation of the accumulator trait to this?
@alamb

pub trait Accumulator{

fn update(&mut self, values: &[ScalarValue]) -> Result<()> {
   // convert the values to &[Arrayref]
   // call the update_batch method
}
fn update_batch(&mut self, values: &[ArrayRef]) -> Result<()>;
}

@realno
Copy link
Contributor

realno commented Jan 12, 2022

+1 on this proposal.

@alamb
Copy link
Contributor Author

alamb commented Jan 13, 2022

@liukun4515

Can we change the default implementation of the accumulator trait to this?

We could change the default implementation, however, I am not sure what purpose that would serve -- the code in datafusion that calls Accumulator only (should) use Accumulator::update_batch

I was thinking of moving the current default implementation of Accumulator::update_batch which calls update() with ScalarValues to some sort of public example or adapter function, so anyone who had existing Accumulator implementations can continue to use them until they move over to update_batch and merge_batch

@liukun4515
Copy link
Contributor

It makes sense to me.
Removing the update and merge is more clear for developers.

@alamb alamb added the datafusion Changes in the datafusion crate label Feb 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants