Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sample_n does not work with grouped data #405

Closed
emiliotorres opened this issue Apr 25, 2014 · 2 comments
Closed

sample_n does not work with grouped data #405

emiliotorres opened this issue Apr 25, 2014 · 2 comments

Comments

@emiliotorres
Copy link

Dear Sir,
sample_n does not respect the group_by variable. sample_frec works fine.
Best regards
Emilio

set.seed(123)
iris %>% sample_n(5) # OK
##     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 44           5.0         3.5          1.6         0.6     setosa
## 118          7.7         3.8          6.7         2.2  virginica
## 61           5.0         2.0          3.5         1.0 versicolor
## 130          7.2         3.0          5.8         1.6  virginica
## 138          6.4         3.1          5.5         1.8  virginica


set.seed(123)
ig <- iris %>% group_by(Species)
ig %>% sample_n(5) # Wrong! I expect 15 (5 x 3 levels of Species). But I get the same result
## Source: local data frame [5 x 5]
## Groups: Species

##     Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 44           5.0         3.5          1.6         0.6     setosa
## 118          7.7         3.8          6.7         2.2  virginica
## 61           5.0         2.0          3.5         1.0 versicolor
## 130          7.2         3.0          5.8         1.6  virginica
## 138          6.4         3.1          5.5         1.8  virginica


set.seed(123)
ig %>% sample_frac(0.1) #OK
## Source: local data frame [15 x 5]
## Groups: Species

@emiliotorres
Copy link
Author

EDIT: sample_frac also fails. I expect 5 registers for each level (5 setosa, 5 versicolor, 5 virginica), but I get a sample of the original data base:

set.seed(123)
a <- ig %>% sample_frac(0.1)
table(a$Species)
   setosa versicolor  virginica 
         3          6          6

@joranE
Copy link
Contributor

joranE commented May 23, 2014

Silly question: I see the sample_n.grouped_df methods in sample.R, but they aren't registered in NAMESPACE, so only sample_n.data.frame is being dispatched in 0.2. Simple oversight, or is there a reason they aren't finished yet?

azvoleff added a commit to azvoleff/teamlucc that referenced this issue Jul 2, 2014
@hadley hadley closed this as completed in 9d26726 Jul 28, 2014
krlmlr pushed a commit to krlmlr/dplyr that referenced this issue Mar 2, 2016
Export sample_n/frac methods for grouped_df. Closes tidyverse#405.
@lock lock bot locked as resolved and limited conversation to collaborators Jun 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants