Consider sorting instead of hashing for uniqueness #736

hadley · 2020-01-08T15:47:50Z

https://news.ycombinator.com/item?id=21985246

This would allow us to use radix sort of for integer indices, which would help us to match data.table performance when grouping in dplyr.

DavisVaughan · 2022-09-28T14:42:28Z

See #1361 (comment) for explanation of why we probably don't want switch all of the dictionary functions over to sorting

We did expose vec_locate_sorted_groups() (which uses radix ordering) for use in dplyr's group_by() tidyverse/dplyr#6297, which was really the original motivation of this issue

lionel- added op:equal-compare-hash performance labels Apr 20, 2020

hadley mentioned this issue Jun 23, 2020

Native vec_order() #1143

Merged

DavisVaughan mentioned this issue Apr 13, 2021

Implement alternatives to the vec_unique_*() family #1361

Closed

DavisVaughan closed this as completed Sep 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider sorting instead of hashing for uniqueness #736

Consider sorting instead of hashing for uniqueness #736

hadley commented Jan 8, 2020

DavisVaughan commented Sep 28, 2022

Consider sorting instead of hashing for uniqueness #736

Consider sorting instead of hashing for uniqueness #736

Comments

hadley commented Jan 8, 2020

DavisVaughan commented Sep 28, 2022