Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add n_unique() = length(unique(x)) - useful while grouping #884

Closed
arunsrinivasan opened this issue Oct 13, 2014 · 5 comments
Closed

Add n_unique() = length(unique(x)) - useful while grouping #884

arunsrinivasan opened this issue Oct 13, 2014 · 5 comments
Assignees
Milestone

Comments

@arunsrinivasan
Copy link
Member

The SO question that triggered the thought:

Instead of having to do:

DT[, length(unique(.)), by=.]

We could do with:

DT[, n_unique(.), by=.]

This'll especially be faster for data.tables though because we don't have to subset the entire data.table to know the number of unique values.

Here's a quick benchmark:

require(data.table)
x = sample(1e2, 1e7, TRUE)
system.time(ans1 <- length(unique(x))) # 0.667 seconds
system.time(ans2 <- length(attr(data.table:::forderv(x, retGrp=TRUE), 'starts'))) # 0.1 seconds

We could, in addition, also internally optimise length(unique(.)) to n_unique(.).

@matthieugomez
Copy link
Contributor

I'd like this.

It would be nice if the function also had a method for data.table: relevant SO question here and there. For instance to find the number of unique combinations (v2, v3) within groups defined by v1, one could do:

DT[, n_unique(.SD), by = v1, .SDcols = c("v2", "v3")]

@arunsrinivasan
Copy link
Member Author

Implemented as uniqueN.

@arunsrinivasan arunsrinivasan modified the milestones: v1.9.6, v1.9.8 Jan 25, 2015
@matthieugomez
Copy link
Contributor

It would be nice if the syntax was made closer to unique with a by option (if by is not given, default to keys)

@jangorecki
Copy link
Member

@matthieugomez
Copy link
Contributor

Great! Sorry I was looking at the original commit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants