Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replaced the implementation of Table method dropDuplicateRows() with … #1058

Merged
merged 1 commit into from
Mar 26, 2022

Conversation

lwhite1
Copy link
Collaborator

@lwhite1 lwhite1 commented Mar 26, 2022

…one that uses less memory

Thanks for contributing.

Description

The method dropDuplicateRows was very inefficient in its use of memory. There was, by the end of the method:

  • the original table
  • a sorted copy of the original
  • a copy of the sorted copy, without the duplicates

Plus, the comparison method to test equality value-by-value used a generic method that auto boxed all the primitive values.

This was replaced by a method that computes a hash function for the rows which is used to test for equality, eliminating the need for the sorted table copy. Unfortunately, the auto boxing equality test remains at least for now, however. it is only called when duplicate rows are encountered, rather than for every row in the table.

Testing

Did you add a unit test?
yes

@lwhite1 lwhite1 merged commit 1f668db into master Mar 26, 2022
@lwhite1 lwhite1 deleted the deduplicate-rows-perf-enhancement branch March 26, 2022 23:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant