-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
slice_sample()
errors if n
bigger than number of rows
#6185
Comments
I had to write my own wrapper around One note: if you go from 1.0.7 -> 1.0.8, I would not expect breaking changes in the API. This is a breaking change, as my code that previously worked okay now ceased working. |
I have the same problem. I have a package that randomly samples |
slice_sample()
now returns error when "n" is bigger than the number of existing rows.
This seems to be introduced on purpose because there is a test specifically for this error. The documentation however was not updated. |
That's interesting. I also see the test change... In that case, it seems like the change in behavior is on purpose and I'm strongly considering re-writing my own code. I hope that the description for the argument is updated so that there is no ambiguity. |
> slice_sample(iris, n = Inf)
Error in `slice_sample()`:
! Problem while computing indices.
Caused by error in `sample.int()`:
! vector size cannot be infinite
Run `rlang::last_error()` to see where the error occurred. This used to work, and would permute the rows, which was very useful. But we can use BTW, values of |
Definitely need to retain the option to sample "n, or however many rows exist, whichever is smaller". Potentially good to change it from silently doing this behavior without warning the user, but a lot of us have code that depends on the function not throwing an error in this situation |
Regression introduced in #6172 — the PR doesn't explicitly mention this change and there's no NEWS bullet so I suspect it's an omission, not a deliberate change. |
slice_sample()
now returns error when "n" is bigger than the number of existing rows.slice_sample()
errors if "n" bigger than number of rows
slice_sample()
errors if "n" bigger than number of rowsslice_sample()
errors if n
bigger than number of rows
Has this been addressed? I did not have this issue until recently, but now it is happening for me as well. |
Reading this thread through more carefully, I think this is mainly a documentation issue — if you want to sample more rows than exist, you do need to set Minimal reprex: library(dplyr, warn.conflicts = FALSE)
df <- data.frame(x = 1:3)
df %>% slice_sample(n = 4)
#> Error in `slice_sample()`:
#> ! Problem while computing indices.
#> Caused by error:
#> ! Can't sample without replacement using a size that is larger than the
#> number of rows in the data.
#> ℹ 4 rows were requested in the sample.
#> ℹ 3 rows are present in the data.
#> ℹ Set `replace = TRUE` to sample with replacement.
df %>% slice_sample(n = 4, replace = TRUE)
#> x
#> 1 1
#> 2 1
#> 3 3
#> 4 1 Created on 2022-07-21 by the reprex package (v2.0.1) |
The issue is: I don't want to sample with replacement in the samples where
size > n.
If I take a random sample of 100 from samples of:
94
125
200
80
I want to get back:
94
100
100
80
With no replacement in the larger samples. This used to be how
slice_sample() behaved, but it seems to have changed for some reason.
Best,
David Tatarakis
…On Thu, Jul 21, 2022 at 12:21 PM Hadley Wickham ***@***.***> wrote:
Reading this thread through more carefully, I think this is mainly a
documentation issue — if you want to sample more rows than exist, you do
need to set replace = TRUE as the error suggests.
Minimal reprex:
library(dplyr, warn.conflicts = FALSE)
df <- data.frame(x = 1:3)
df %>% slice_sample(n = 4)
#> Error in `slice_sample()`:
#> ! Problem while computing indices.
#> Caused by error:
#> ! Can't sample without replacement using a size that is larger than the
#> number of rows in the data.
#> ℹ 4 rows were requested in the sample.
#> ℹ 3 rows are present in the data.
#> ℹ Set `replace = TRUE` to sample with replacement.
df %>% slice_sample(n = 4, replace = TRUE)
#> x
#> 1 1
#> 2 1
#> 3 3
#> 4 1
Created on 2022-07-21 by the reprex package <https://reprex.tidyverse.org>
(v2.0.1)
—
Reply to this email directly, view it on GitHub
<#6185 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKOM2PGMCFLND7VEVAVXQ3LVVF2JXANCNFSM5OFO6JNA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
@dtatarak that sometimes sampling with replacement and sometimes sampling without replacement seems ill-founded to me. |
The replacement is still controlled by the I think it would make sense for all the library(dplyr, warn.conflicts = FALSE)
#> Warning: package 'dplyr' was built under R version 4.1.2
df <- data.frame(a = 1)
df %>%
slice(1:2)
#> a
#> 1 1
df %>%
slice_head(n = 2)
#> a
#> 1 1
df %>%
slice_tail(n = 2)
#> a
#> 1 1
df %>%
slice_min(n = 2, order_by = a)
#> a
#> 1 1
df %>%
slice_max(n = 2, order_by = a)
#> a
#> 1 1
df %>%
slice_sample(n = 2)
#> Error in `slice_sample()`:
#> ! Problem while computing indices.
#> Caused by error in `sample.int()`:
#> ! cannot take a sample larger than the population when 'replace = FALSE' Created on 2022-07-21 by the reprex package (v2.0.1) |
I agree. Definitely not something I do generally, but I have a specific
use-case for it in my work. It's useful to have a function that says "Give
me a random sample that's this size without replacement. If there aren't
enough, just give me all of them." I can certainly do that manually, but
slice_sample() was a very convenient way of doing that within a larger
dplyr statement.
Best,
David Tatarakis
…On Thu, Jul 21, 2022 at 12:42 PM Hadley Wickham ***@***.***> wrote:
@dtatarak <https://github.com/dtatarak> that sometimes sampling with
replacement and sometimes sampling without replacement seems ill-founded to
me.
—
Reply to this email directly, view it on GitHub
<#6185 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKOM2PGD7S2M2LE6YRC5YL3VVF4YJANCNFSM5OFO6JNA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Similar to @dtatarak, I also had (and still have) a use case for sampling more rows than exist in a table. I wrote a function for reducing partial duplicates down to one record, and if the records were "tied" I used |
The way
Maybe clarifying this behavior for |
@eutwt ah that makes sense. Do you want to have a go at a PR? |
Has this bug been resolved in 1.0.10? I am still receiving the aforementioned error when using slice_sample() for resampling to a fixed count in instances where some groups have less observations than specified by n. |
No, but it is fixed in the development version. |
This bug is almost a year old. When the fix would reach CRAN ? Is there any release schedule or something? |
@erydit FWIW if you need this right away and can't install the dev package, this is pretty easy to "patch" by redefining I should probably mention I don't work at Rstudio and this hacky patch is not recommended by them I'd guess :)
|
`slice_sample(..., n = Inf)` fungerer ikkje lenger, til trass for at dokumentasjonen seier at det skal det. Må derfor heller bruka argumentet `prop = 1` (som strengt tatt òg er meir logisk). Sjå relatert feilrapport: tidyverse/dplyr#6185
`slice_sample(..., n = Inf)` fungerer ikkje lenger, til trass for at dokumentasjonen seier at det skal det. Må derfor heller bruka argumentet `prop = 1` (som strengt tatt òg er meir logisk). Sjå relatert feilrapport: tidyverse/dplyr#6185
`slice_sample(..., n = Inf)` fungerer ikkje lenger, til trass for at dokumentasjonen seier at det skal det. Må derfor heller bruka argumentet `prop = 1` (som strengt tatt òg er meir logisk). Sjå relatert feilrapport: tidyverse/dplyr#6185
`slice_sample(..., n = Inf)` fungerer ikkje lenger, til trass for at dokumentasjonen seier at det skal det. Må derfor heller bruka argumentet `prop = 1` (som strengt tatt òg er meir logisk). Sjå relatert feilrapport: tidyverse/dplyr#6185
`slice_sample(..., n = Inf)` fungerer ikkje lenger, til trass for at dokumentasjonen seier at det skal det. Må derfor heller bruka argumentet `prop = 1` (som strengt tatt òg er meir logisk). Sjå relatert feilrapport: tidyverse/dplyr#6185
In the man page for
slice
functions, the description for the argumentn
states:The output of
slice_sample
used to be the samedata.frame
(with different ordering) ifn
is higher than the number of rows, but it is now returning an error.Created on 2022-02-11 by the reprex package (v2.0.1)
The text was updated successfully, but these errors were encountered: