Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summarize function in sharded db #4

Open
e9gille opened this issue Apr 11, 2016 · 3 comments
Open

Summarize function in sharded db #4

e9gille opened this issue Apr 11, 2016 · 3 comments

Comments

@e9gille
Copy link
Collaborator

e9gille commented Apr 11, 2016

The Summarize function in sharded databases doesn't "summarize" the individual shard results. It also attempts to re-summarize partial results on WS FULL, but I believe it is doing so incorrectly by using the same summary function as originally used on the raw data.

@e9gille
Copy link
Collaborator Author

e9gille commented Apr 11, 2016

Added test cases to highlight the issue:
#5

@mkromberg
Copy link
Contributor

I believe the WS FULL implemetation is currently correct, but only because the only summary functions supported are count, sum, max and min. If you needed to add avg or similar functions, you'd need to do more work. I will look at the sharding issue.

@e9gille
Copy link
Collaborator Author

e9gille commented Apr 11, 2016

Well, count would be incorrect as well as it should sum up the individual counts when re-summarizing. But it is buggy anyway because the groupfn takes vectors of columns as argument. I've fixed in my fork and added new functions to re-summarize.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants