Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using reframe() inside of metric_summarizer() rather than summarise()? #355

Closed
mikemahoney218 opened this issue Jan 30, 2023 · 6 comments

Comments

@mikemahoney218
Copy link
Member

Feature

As of dplyr 1.1.0, summarise gives a warning when returning more than 1 row per group. As a result, metric_summarizer() now warns when metrics return more than one row for ungrouped data.frames:

library(yardstick)
#> For binary classification, the first factor level is assumed to be the event.
#> Use the argument `event_level = "second"` to alter this as needed.
library(rlang)

resid <- function(data, ...) {
  UseMethod("resid")
}

resid <- new_numeric_metric(resid, direction = "zero")

resid.data.frame <- function(data, truth, estimate, na_rm = TRUE, ...) {
  metric_summarizer(
    metric_nm = "residual",
    metric_fn = resid_vec,
    data = data,
    truth = !! enquo(truth),
    estimate = !! enquo(estimate),
    na_rm = na_rm,
    metric_fn_options = list(...)
  )
}

resid_vec <- function(truth, estimate, wt, na_rm = TRUE, ...) {
  
  resid_impl <- function(truth, estimate, ...) {
    truth - estimate
  }
  
  metric_vec_template(
    metric_impl = resid_impl,
    truth = truth,
    estimate = estimate,
    cls = "numeric",
    na_rm = na_rm,
    ...
  )
}

resid(data.frame(x = 1:5, y = 6:10), x, y)
#> Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
#> dplyr 1.1.0.
#> ℹ Please use `reframe()` instead.
#> ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
#>   always returns an ungrouped data frame and adjust accordingly.
#> ℹ The deprecated feature was likely used in the yardstick package.
#>   Please report the issue at <https://github.com/tidymodels/yardstick/issues>.
#> # A tibble: 5 × 3
#>   .metric  .estimator .estimate
#>   <chr>    <chr>          <int>
#> 1 residual standard          -5
#> 2 residual standard          -5
#> 3 residual standard          -5
#> 4 residual standard          -5
#> 5 residual standard          -5

Created on 2023-01-30 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.2 Patched (2022-11-10 r83330)
#>  os       Ubuntu 22.04.1 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2023-01-30
#>  pandoc   2.19.2 @ /usr/lib/rstudio/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  cli           3.6.0   2023-01-09 [1] CRAN (R 4.2.2)
#>  digest        0.6.31  2022-12-11 [1] CRAN (R 4.2.2)
#>  dplyr         1.1.0   2023-01-29 [1] CRAN (R 4.2.2)
#>  evaluate      0.20    2023-01-17 [1] CRAN (R 4.2.2)
#>  fansi         1.0.4   2023-01-22 [1] CRAN (R 4.2.2)
#>  fastmap       1.1.0   2021-01-25 [1] CRAN (R 4.2.2)
#>  fs            1.6.0   2023-01-23 [1] CRAN (R 4.2.2)
#>  generics      0.1.3   2022-07-05 [1] CRAN (R 4.2.2)
#>  glue          1.6.2   2022-02-24 [1] CRAN (R 4.2.2)
#>  htmltools     0.5.4   2022-12-07 [1] CRAN (R 4.2.2)
#>  knitr         1.42    2023-01-25 [1] CRAN (R 4.2.2)
#>  lifecycle     1.0.3   2022-10-07 [1] CRAN (R 4.2.2)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.2.2)
#>  pillar        1.8.1   2022-08-19 [1] CRAN (R 4.2.2)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.2.2)
#>  purrr         1.0.1   2023-01-10 [1] CRAN (R 4.2.2)
#>  R.cache       0.16.0  2022-07-21 [1] CRAN (R 4.2.2)
#>  R.methodsS3   1.8.2   2022-06-13 [1] CRAN (R 4.2.2)
#>  R.oo          1.25.0  2022-06-12 [1] CRAN (R 4.2.2)
#>  R.utils       2.12.2  2022-11-11 [1] CRAN (R 4.2.2)
#>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.2.2)
#>  reprex        2.0.2   2022-08-17 [1] CRAN (R 4.2.2)
#>  rlang       * 1.0.6   2022-09-24 [1] CRAN (R 4.2.2)
#>  rmarkdown     2.20    2023-01-19 [1] CRAN (R 4.2.2)
#>  rstudioapi    0.14    2022-08-22 [1] CRAN (R 4.2.2)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.2.2)
#>  styler        1.8.1   2022-11-07 [1] CRAN (R 4.2.2)
#>  tibble        3.1.8   2022-07-22 [1] CRAN (R 4.2.2)
#>  tidyselect    1.2.0   2022-10-10 [1] CRAN (R 4.2.2)
#>  utf8          1.2.2   2021-07-24 [1] CRAN (R 4.2.2)
#>  vctrs         0.5.2   2023-01-23 [1] CRAN (R 4.2.2)
#>  withr         2.5.0   2022-03-03 [1] CRAN (R 4.2.2)
#>  xfun          0.36    2022-12-21 [1] CRAN (R 4.2.2)
#>  yaml          2.3.7   2023-01-23 [1] CRAN (R 4.2.2)
#>  yardstick   * 1.1.0   2022-09-07 [1] CRAN (R 4.2.2)
#> 
#>  [1] /home/mikemahoney218/R/x86_64-pc-linux-gnu-library/4.2
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

The actual place I'm running into this is in waywiser, which has a number of "local" spatial autocorrelation metrics which return one value per observation, representing the local spatial autocorrelation at an individual point:

library(waywiser)

guerry_model <- guerry
guerry_lm <- lm(Crm_prs ~ Litercy, guerry_model)
guerry_model$predictions <- predict(guerry_lm, guerry_model)

# same issue with local_geary_c ; local_getis_ord_g
ww_local_moran_i(guerry_model, Crm_prs, predictions)
#> Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
#> dplyr 1.1.0.
#> ℹ Please use `reframe()` instead.
#> ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
#>   always returns an ungrouped data frame and adjust accordingly.
#> ℹ The deprecated feature was likely used in the yardstick package.
#>   Please report the issue at <https://github.com/tidymodels/yardstick/issues>.
#> # A tibble: 85 × 3
#>    .metric       .estimator .estimate
#>    <chr>         <chr>          <dbl>
#>  1 local_moran_i standard      0.530 
#>  2 local_moran_i standard      0.858 
#>  3 local_moran_i standard      0.759 
#>  4 local_moran_i standard      0.732 
#>  5 local_moran_i standard      0.207 
#>  6 local_moran_i standard      0.860 
#>  7 local_moran_i standard      0.692 
#>  8 local_moran_i standard      1.69  
#>  9 local_moran_i standard     -0.0109
#> 10 local_moran_i standard      0.710 
#> # … with 75 more rows

Created on 2023-01-30 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.2.2 Patched (2022-11-10 r83330)
#>  os       Ubuntu 22.04.1 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2023-01-30
#>  pandoc   2.19.2 @ /usr/lib/rstudio/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date (UTC) lib source
#>  boot          1.3-28.1   2022-11-22 [1] CRAN (R 4.2.2)
#>  class         7.3-20     2022-01-13 [1] CRAN (R 4.2.2)
#>  classInt      0.4-8      2022-09-29 [1] CRAN (R 4.2.2)
#>  cli           3.6.0      2023-01-09 [1] CRAN (R 4.2.2)
#>  DBI           1.1.3      2022-06-18 [1] CRAN (R 4.2.2)
#>  deldir        1.0-6      2021-10-23 [1] CRAN (R 4.2.2)
#>  digest        0.6.31     2022-12-11 [1] CRAN (R 4.2.2)
#>  dplyr         1.1.0      2023-01-29 [1] CRAN (R 4.2.2)
#>  e1071         1.7-12     2022-10-24 [1] CRAN (R 4.2.2)
#>  evaluate      0.20       2023-01-17 [1] CRAN (R 4.2.2)
#>  fansi         1.0.4      2023-01-22 [1] CRAN (R 4.2.2)
#>  fastmap       1.1.0      2021-01-25 [1] CRAN (R 4.2.2)
#>  fs            1.6.0      2023-01-23 [1] CRAN (R 4.2.2)
#>  generics      0.1.3      2022-07-05 [1] CRAN (R 4.2.2)
#>  glue          1.6.2      2022-02-24 [1] CRAN (R 4.2.2)
#>  htmltools     0.5.4      2022-12-07 [1] CRAN (R 4.2.2)
#>  KernSmooth    2.23-20    2021-05-03 [1] CRAN (R 4.2.2)
#>  knitr         1.42       2023-01-25 [1] CRAN (R 4.2.2)
#>  lattice       0.20-45    2021-09-22 [1] CRAN (R 4.2.2)
#>  lifecycle     1.0.3      2022-10-07 [1] CRAN (R 4.2.2)
#>  magrittr      2.0.3      2022-03-30 [1] CRAN (R 4.2.2)
#>  pillar        1.8.1      2022-08-19 [1] CRAN (R 4.2.2)
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.2.2)
#>  proxy         0.4-27     2022-06-09 [1] CRAN (R 4.2.2)
#>  purrr         1.0.1      2023-01-10 [1] CRAN (R 4.2.2)
#>  R.cache       0.16.0     2022-07-21 [1] CRAN (R 4.2.2)
#>  R.methodsS3   1.8.2      2022-06-13 [1] CRAN (R 4.2.2)
#>  R.oo          1.25.0     2022-06-12 [1] CRAN (R 4.2.2)
#>  R.utils       2.12.2     2022-11-11 [1] CRAN (R 4.2.2)
#>  R6            2.5.1      2021-08-19 [1] CRAN (R 4.2.2)
#>  Rcpp          1.0.10     2023-01-22 [1] CRAN (R 4.2.2)
#>  reprex        2.0.2      2022-08-17 [1] CRAN (R 4.2.2)
#>  rlang         1.0.6      2022-09-24 [1] CRAN (R 4.2.2)
#>  rmarkdown     2.20       2023-01-19 [1] CRAN (R 4.2.2)
#>  rstudioapi    0.14       2022-08-22 [1] CRAN (R 4.2.2)
#>  s2            1.1.2      2023-01-12 [1] CRAN (R 4.2.2)
#>  sessioninfo   1.2.2      2021-12-06 [1] CRAN (R 4.2.2)
#>  sf            1.0-9      2022-11-08 [1] CRAN (R 4.2.2)
#>  sp            1.6-0      2023-01-19 [1] CRAN (R 4.2.2)
#>  spData        2.2.1      2022-11-15 [1] CRAN (R 4.2.2)
#>  spDataLarge   2.0.9      2023-01-19 [1] https://geocompr.r-universe.dev (R 4.2.2)
#>  spdep         1.2-7      2022-10-01 [1] CRAN (R 4.2.2)
#>  styler        1.8.1      2022-11-07 [1] CRAN (R 4.2.2)
#>  tibble        3.1.8      2022-07-22 [1] CRAN (R 4.2.2)
#>  tidyselect    1.2.0      2022-10-10 [1] CRAN (R 4.2.2)
#>  units         0.8-1      2022-12-10 [1] CRAN (R 4.2.2)
#>  utf8          1.2.2      2021-07-24 [1] CRAN (R 4.2.2)
#>  vctrs         0.5.2      2023-01-23 [1] CRAN (R 4.2.2)
#>  waywiser    * 0.2.0.9000 2023-01-25 [1] local
#>  withr         2.5.0      2022-03-03 [1] CRAN (R 4.2.2)
#>  wk            0.7.1      2022-12-09 [1] CRAN (R 4.2.2)
#>  xfun          0.36       2022-12-21 [1] CRAN (R 4.2.2)
#>  yaml          2.3.7      2023-01-23 [1] CRAN (R 4.2.2)
#>  yardstick     1.1.0      2022-09-07 [1] CRAN (R 4.2.2)
#> 
#>  [1] /home/mikemahoney218/R/x86_64-pc-linux-gnu-library/4.2
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

I know this is a bit of an off-label use for yardstick, but being able to plug these local metrics into the yardstick framework has been really useful so far. Is there any chance you'd consider using reframe() instead of summarise() here? Or, are there any workarounds you can think of for this issue?

For context, in my actual code, I don't think I can easily group the data frame so each row is its own grouping, because spatial autocorrelation requires the information about residual values for all neighboring observations as well as the observation itself.

@EmilHvitfeldt
Copy link
Member

so this is a little weird right now. But metric_summarizer() that you are using is being deprecated in Dev right now. And I imagine that you should properly create a new *_metric_summarizer() that fits your need in {waywiser} (I will of cause help with that) similarly to #322.

Do you have any timelines that you need things by?

@mikemahoney218
Copy link
Member Author

No timelines! That makes sense to me; it doesn't look like it'll be that painful to make a custom summarizer (reframer?). Thanks for the pointer on the direction things are going 😄

@EmilHvitfeldt
Copy link
Member

I'm doing a yardstick early next week, fyi

@DavisVaughan
Copy link
Member

I think we still expect most usage of yardstick to return 1 row per group so sticking with summarise() sounds like the right decision to me to avoid accidental errors when implementing new metrics (like accidentally returning a size 0 result in one of the groups)

@EmilHvitfeldt
Copy link
Member

Thank you all, this issue has been resolved, and I will thus close it.

@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Feb 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants