Skip to content

Lint some code plus improve performance with base R and vctrs + reduce pipe #1728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 21, 2024

Conversation

olivroy
Copy link
Collaborator

@olivroy olivroy commented Jun 20, 2024

Summary

This PR updates some code and has 2 goals

Performance

Partially inspired by https://www.tidyverse.org/blog/2023/04/performant-packages/

  • Use more vctrs and base R instead of dplyr when possible
  • use fixed = TRUE
  • Simplify some dplyr code
  • a tibble is not always necessary. and can be replaced with vctrs::data_frame() for complex cases or data.frame()

linting / code consistency

  • Reduce magrittr pipe in codebase
  • Use magrittr pipe in test-fmt_percent.R
  • linting suggested by lintr
  • space and quotes

Checklist

#performance

Other possibilities, see if we can reduce as_tibble() calls

Profiling

Before:
image
This PR
image

Seems to be a significant memory usage gain!

with the following example:

The overall impact may not be that big, but still base R has some interesting improvements

bench::mark(
  mtcars |> dplyr::filter(vs > 5) |> dplyr::pull(cyl),
  mtcars[mtcars$vs > 5, "cyl", drop = TRUE]
)
#> # A tibble: 2 × 6
#>   expression                            min  median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                        <bch:t> <bch:t>     <dbl> <bch:byt>    <dbl>
#> 1 "dplyr::pull(dplyr::filter(mtcar…  1.08ms  1.43ms      556.    4.05MB     8.46
#> 2 "mtcars[mtcars$vs > 5, \"cyl\", …  16.7µs  19.5µs    36483.      184B     7.30

bench::mark(
  mtcars |> dplyr::arrange(cyl),
  mtcars[order(mtcars$cyl), ]
)
#> # A tibble: 2 × 6
#>   expression                       min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 dplyr::arrange(mtcars, cyl)    1.6ms   1.93ms      468.    4.24MB     6.36
#> 2 mtcars[order(mtcars$cyl), ]  107.8µs  122.7µs     6326.    3.86KB    12.6

I used this example

library(dplyr)
library(gt) # or load_all()
best_gas_mileage_city <- 
  gtcars |> 
  arrange(desc(mpg_c)) |>
  slice(1) |>
  mutate(car = paste(mfr, model)) |>
  pull(car)

# Use dplyr functions to get the car with the highest horsepower
# this will be used to target the correct cell for a footnote
highest_horsepower <- 
  gtcars |> 
  arrange(desc(hp)) |>
  slice(1) |>
  mutate(car = paste(mfr, model)) |>
  pull(car)

# Define our preferred order for `ctry_origin`
order_countries <- c("Germany", "Italy", "United States", "Japan")

# Create a display table with `gtcars`, using all of the previous
# statements piped together + additional `tab_footnote()` stmts
tab <-
  gtcars |>
  arrange(
    factor(ctry_origin, levels = order_countries),
    mfr, desc(msrp)
  ) |>
  mutate(car = paste(mfr, model)) |>
  select(-mfr, -model) |>
  group_by(ctry_origin) |>
  gt(rowname_col = "car") |>
  cols_hide(columns = c(drivetrain, bdy_style)) |>
  cols_move(
    columns = c(trsmn, mpg_c, mpg_h),
    after = trim
  ) |>
  tab_spanner(
    label = "Performance",
    columns = c(mpg_c, mpg_h, hp, hp_rpm, trq, trq_rpm)
  ) |>
  cols_merge(
    columns = c(mpg_c, mpg_h),
    pattern = "<<{1}c<br>{2}h>>"
  ) |>
  cols_merge(
    columns = c(hp, hp_rpm),
    pattern = "{1}<br>@{2}rpm"
  ) |>
  cols_merge(
    columns = c(trq, trq_rpm),
    pattern = "{1}<br>@{2}rpm"
  ) |>
  cols_label(
    mpg_c = "MPG",
    hp = "HP",
    trq = "Torque",
    year = "Year",
    trim = "Trim",
    trsmn = "Transmission",
    msrp = "MSRP"
  ) |>
  fmt_currency(columns = msrp, decimals = 0) |>
  cols_align(
    align = "center",
    columns = c(mpg_c, hp, trq)
  ) |>
  tab_style(
    style = cell_text(size = px(12)),
    locations = cells_body(
      columns = c(trim, trsmn, mpg_c, hp, trq)
    )
  ) |>
  text_transform(
    locations = cells_body(columns = trsmn),
    fn = function(x) {
      
      speed <- substr(x, 1, 1)
      
      type <-
        dplyr::case_when(
          substr(x, 2, 3) == "am" ~ "Automatic/Manual",
          substr(x, 2, 2) == "m" ~ "Manual",
          substr(x, 2, 2) == "a" ~ "Automatic",
          substr(x, 2, 3) == "dd" ~ "Direct Drive"
        )
      
      paste(speed, " Speed<br><em>", type, "</em>")
    }
  ) |>
  tab_header(
    title = md("The Cars of **gtcars**"),
    subtitle = "These are some fine automobiles"
  ) |>
  tab_source_note(
    source_note = md(
      "Source: Various pages within the Edmonds website."
    )
  ) |>
  tab_footnote(
    footnote = md("Best gas mileage (city) of all the **gtcars**."),
    locations = cells_body(
      columns = mpg_c,
      rows = best_gas_mileage_city
    )
  ) |>
  tab_footnote(
    footnote = md("The highest horsepower of all the **gtcars**."),
    locations = cells_body(
      columns = hp,
      rows = highest_horsepower
    )
  ) |>
  tab_footnote(
    footnote = "All prices in U.S. dollars (USD).",
    locations = cells_column_labels(columns = msrp)
  ) 
tab %>% gt:::render_as_html()

@olivroy olivroy marked this pull request as ready for review June 20, 2024 20:46
@rich-iannone
Copy link
Member

All of this is so great. I somewhat regret using so much dplyr in the codebase (though I guess it was faster for development) but going down the road of slowly replacing this with base R will make everything so much faster. Thanks for your dedication here!

Copy link
Member

@rich-iannone rich-iannone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rich-iannone rich-iannone merged commit 1d1ee57 into rstudio:master Jun 21, 2024
12 checks passed
@olivroy olivroy deleted the lint2 branch June 21, 2024 15:01
@olivroy olivroy mentioned this pull request Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants