-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
high dimensional datasets? #467
Comments
This is happening because of the way R handles many variables in formulas. For now, you can pass your data without a formula and manually update the roles. library(tidymodels)
testdf = as_tibble(matrix(rnorm(500 * 20000), ncol = 20000))
rec = recipe(testdf) %>%
update_role(everything())
rec
#> Data Recipe
#>
#> Inputs:
#>
#> role #variables
#> predictor 20000 Created on 2020-02-07 by the reprex package (v0.3.0) Session infodevtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 3.6.0 (2019-04-26)
#> os macOS Mojave 10.14.6
#> system x86_64, darwin15.6.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz America/Los_Angeles
#> date 2020-02-07
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date lib
#> assertthat 0.2.1 2019-03-21 [1]
#> backports 1.1.5 2019-10-02 [1]
#> base64enc 0.1-3 2015-07-28 [1]
#> bayesplot 1.7.1 2019-12-01 [1]
#> boot 1.3-23 2019-07-05 [1]
#> broom * 0.5.3 2019-12-14 [1]
#> callr 3.4.1 2020-01-24 [1]
#> class 7.3-15 2019-01-01 [1]
#> cli 2.0.1.9000 2020-01-31 [1]
#> codetools 0.2-16 2018-12-24 [1]
#> colorspace 1.4-1 2019-03-18 [1]
#> colourpicker 1.0 2017-09-27 [1]
#> crayon 1.3.4 2020-01-31 [1]
#> crosstalk 1.0.0 2016-12-21 [1]
#> desc 1.2.0 2018-05-01 [1]
#> devtools 2.2.1 2019-09-24 [1]
#> dials * 0.0.4.9000 2020-01-03 [1]
#> DiceDesign 1.8-1 2019-07-31 [1]
#> digest 0.6.23 2019-11-23 [1]
#> dplyr * 0.8.4 2020-01-31 [1]
#> DT 0.10 2019-11-12 [1]
#> dygraphs 1.1.1.6 2018-07-11 [1]
#> ellipsis 0.3.0 2019-09-20 [1]
#> evaluate 0.14 2019-05-28 [1]
#> fansi 0.4.1 2020-01-08 [1]
#> fastmap 1.0.1 2019-10-08 [1]
#> foreach 1.4.7 2019-07-27 [1]
#> fs 1.3.1 2019-05-06 [1]
#> furrr 0.1.0 2018-05-16 [1]
#> future 1.15.1 2019-11-25 [1]
#> generics 0.0.2 2018-11-29 [1]
#> ggplot2 * 3.3.0.9000 2020-01-31 [1]
#> ggridges 0.5.1 2018-09-27 [1]
#> globals 0.12.5 2019-12-07 [1]
#> glue 1.3.1 2019-03-12 [1]
#> gower 0.2.1 2019-05-14 [1]
#> GPfit 1.0-8 2019-02-08 [1]
#> gridExtra 2.3 2017-09-09 [1]
#> gtable 0.3.0 2019-03-25 [1]
#> gtools 3.8.1 2018-06-26 [1]
#> highr 0.8 2019-03-20 [1]
#> htmltools 0.4.0 2019-10-04 [1]
#> htmlwidgets 1.5.1 2019-10-08 [1]
#> httpuv 1.5.2 2019-09-11 [1]
#> igraph 1.2.4.2 2019-11-27 [1]
#> infer * 0.5.1 2019-11-19 [1]
#> inline 0.3.15 2018-05-18 [1]
#> ipred 0.9-9 2019-04-28 [1]
#> iterators 1.0.12 2019-07-26 [1]
#> janeaustenr 0.1.5 2017-06-10 [1]
#> knitr 1.27.2 2020-01-23 [1]
#> later 1.0.0 2019-10-04 [1]
#> lattice 0.20-38 2018-11-04 [1]
#> lava 1.6.6 2019-08-01 [1]
#> lhs 1.0.1 2019-02-03 [1]
#> lifecycle 0.1.0 2019-08-01 [1]
#> listenv 0.8.0 2019-12-05 [1]
#> lme4 1.1-21 2019-03-05 [1]
#> loo 2.1.0 2019-03-13 [1]
#> lubridate 1.7.4 2018-04-11 [1]
#> magrittr 1.5 2014-11-22 [1]
#> markdown 1.1 2019-08-07 [1]
#> MASS 7.3-51.4 2019-03-31 [1]
#> Matrix 1.2-18 2019-11-27 [1]
#> matrixStats 0.55.0 2019-09-07 [1]
#> memoise 1.1.0 2017-04-21 [1]
#> mime 0.9 2020-02-04 [1]
#> miniUI 0.1.1.1 2018-05-18 [1]
#> minqa 1.2.4 2014-10-09 [1]
#> munsell 0.5.0 2018-06-12 [1]
#> nlme 3.1-143 2019-12-10 [1]
#> nloptr 1.2.1 2018-10-03 [1]
#> nnet 7.3-12 2016-02-02 [1]
#> parsnip * 0.0.4.9000 2019-12-25 [1]
#> pillar 1.4.3 2019-12-20 [1]
#> pkgbuild 1.0.6 2019-10-09 [1]
#> pkgconfig 2.0.3 2019-09-22 [1]
#> pkgload 1.0.2 2018-10-29 [1]
#> plyr 1.8.5 2019-12-10 [1]
#> prettyunits 1.1.1 2020-01-24 [1]
#> pROC 1.16.1 2020-01-14 [1]
#> processx 3.4.1 2019-07-18 [1]
#> prodlim 2019.11.13 2019-11-17 [1]
#> promises 1.1.0 2019-10-04 [1]
#> ps 1.3.0 2018-12-21 [1]
#> purrr * 0.3.3 2019-10-18 [1]
#> R6 2.4.1 2019-11-12 [1]
#> Rcpp 1.0.3 2019-11-08 [1]
#> recipes * 0.1.9 2020-01-16 [1]
#> remotes 2.1.0.9000 2020-01-31 [1]
#> reshape2 1.4.3 2017-12-11 [1]
#> rlang 0.4.4 2020-01-28 [1]
#> rmarkdown 2.1 2020-01-20 [1]
#> rpart 4.1-15 2019-04-12 [1]
#> rprojroot 1.3-2 2018-01-03 [1]
#> rsample * 0.0.5 2019-07-12 [1]
#> rsconnect 0.8.16 2019-12-13 [1]
#> rstan 2.19.2 2019-07-09 [1]
#> rstanarm 2.19.2 2019-10-03 [1]
#> rstantools 2.0.0 2019-09-15 [1]
#> rstudioapi 0.10.0-9003 2020-01-31 [1]
#> scales * 1.1.0 2019-11-18 [1]
#> sessioninfo 1.1.1 2018-11-05 [1]
#> shiny 1.4.0 2019-10-10 [1]
#> shinyjs 1.0 2018-01-08 [1]
#> shinystan 2.5.0 2018-05-01 [1]
#> shinythemes 1.1.2 2018-11-06 [1]
#> SnowballC 0.6.0 2019-01-15 [1]
#> StanHeaders 2.19.0 2019-09-07 [1]
#> stringi 1.4.5 2020-01-11 [1]
#> stringr 1.4.0 2019-02-10 [1]
#> survival 3.1-8 2019-12-03 [1]
#> testthat 2.3.1 2019-12-01 [1]
#> threejs 0.3.1 2017-08-13 [1]
#> tibble * 2.1.3 2019-06-06 [1]
#> tidymodels * 0.0.3 2019-10-04 [1]
#> tidyposterior 0.0.2 2018-11-15 [1]
#> tidypredict 0.4.3 2019-09-03 [1]
#> tidyr * 1.0.2 2020-01-24 [1]
#> tidyselect 1.0.0 2020-01-27 [1]
#> tidytext 0.2.2.900 2019-10-19 [1]
#> timeDate 3043.102 2018-02-21 [1]
#> tokenizers 0.2.1 2018-03-29 [1]
#> usethis 1.5.1.9000 2020-02-05 [1]
#> vctrs 0.2.99.9005 2020-02-05 [1]
#> withr 2.1.2.9000 2020-01-31 [1]
#> workflows 0.0.0.9002 2019-12-21 [1]
#> xfun 0.12 2020-01-13 [1]
#> xtable 1.8-4 2019-04-21 [1]
#> xts 0.11-2 2018-11-05 [1]
#> yaml 2.2.1 2020-02-01 [1]
#> yardstick * 0.0.5.9000 2020-01-31 [1]
#> zoo 1.8-6 2019-05-28 [1]
#> source
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> Github (r-lib/cli@e9f041e)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> Github (r-lib/crayon@f4bc7b8)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> local
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> Github (tidyverse/ggplot2@81ffdd0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> Github (yihui/knitr@ab191b0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> Github (tidymodels/parsnip@2e5d3fa)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> Github (tidymodels/recipes@b40a0cf)
#> Github (r-lib/remotes@8d8d545)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> Github (rstudio/rstudioapi@eab7bcc)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> Github (juliasilge/tidytext@525c1f7)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> Github (r-lib/usethis@ff34e40)
#> Github (r-lib/vctrs@9970a0b)
#> Github (r-lib/withr@16d47fd)
#> Github (tidymodels/workflows@305fe6a)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> CRAN (R 3.6.0)
#> local
#> CRAN (R 3.6.0)
#>
#> [1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library |
Thanks! that worked. |
Probably not. It isn't something that is usually encountered and formulas become very expensive with a large number of columns. |
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue. |
Hello
I'm finding issues using high dimensional datasets (as in genomic problems) with the recipe package where it generates stack overflow problems.
Minimal_ Reproducible Code:
The output is:
Session Info:
The text was updated successfully, but these errors were encountered: