Skip to content

Improve performance of render_html() #1470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Oct 19, 2023

Conversation

mgirlich
Copy link
Contributor

@mgirlich mgirlich commented Oct 16, 2023

Summary

Rendering slightly bigger tables (e.g. 50k rows) as HTML is very slow. The two main culprits are create_body_component_h() and {htmltools} (in particular tagWrite() but also creating the tags themself).

Two small benchmarks

diamonds_1000 <- ggplot2::diamonds |>
  head(1000) |>
  gt::gt()
diamonds <- ggplot2::diamonds |>
  gt::gt()

prepare <- function(data) {
  data <- gt:::build_data(data = data, context = "html")
  data <- gt:::add_css_styles(data = data)
  gt:::create_body_component_h(data = data)
}

bench::mark(
  diamonds_1000 = prepare(diamonds_1000),
  diamonds = prepare(diamonds),
  check = FALSE
)

# CRAN
# # A tibble: 2 × 13
#   expression        min median `itr/sec` mem_alloc `gc/sec` n_itr
#   <bch:expr>    <bch:t> <bch:>     <dbl> <bch:byt>    <dbl> <int>
# 1 diamonds_1000   1.14s  1.14s    0.878     5.37MB     2.63     1
# 2 diamonds       56.42s 56.42s    0.0177  175.21MB     2.11     1

# This PR
# # A tibble: 2 × 13
#   expression         min   median `itr/sec` mem_alloc `gc/sec` n_itr
#   <bch:expr>    <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int>
# 1 diamonds_1000 146.87ms 148.81ms     6.33     7.72MB     6.33     4
# 2 diamonds         7.46s    7.46s     0.134  301.76MB     6.03     1

Checklist

Notes

create_body_component_h() is quite huge. I think it would make sense to split parts of it into separate functions, e.g. a function for the group headings, the summaries, the data itself. I didn't do this as I didn't want to change the style too much.

The PR is probably quite a bit easier to understand by looking at the commits one by one. Otherwise the changes are hard to follow.

There are a couple of places in render_row_data() that deal with spaces and creating HTML. This feels quite tedious and error prone. It would be great to use {htmltools} for such a task but this would require to increase the performance of {htmltools} much more.

I also opened an issue in {htmltools} regarding the performance

@CLAassistant
Copy link

CLAassistant commented Oct 16, 2023

CLA assistant check
All committers have signed the CLA.

@@ -1063,419 +1063,400 @@ create_body_component_h <- function(data) {
}

# Create a default vector of row span values for group labels as a column
row_span_vals <- rep_len(list(NULL), n_cols_total)
row_span_vals <- rep_len(NA_integer_, n_cols_total)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using vectors instead of lists is much nicer to work with and also faster.

group_idx[seq(start, end)] <- i
group_ids[seq(start, end)] <- groups_rows_df$group_id[[i]]
}
groups_list <- as.list(groups_rows_df)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accessing fields from a list is much faster than fields of a data frame

@rich-iannone
Copy link
Member

@mgirlich This is great work and I really appreciate it!

The only problem I can see is that adding vtcrs as a dependency would bring the total number to 21, which is something that CRAN flags during updates (and ultimately won't, to my knowledge, accept). However, the tibble dependency could be removed with some minor, extra work. If you'd be willing to replace the two instances where tibble is used, then the vtcrs addition will be okay.

From a quick search, tibble functions are used here:

Copy link
Member

@rich-iannone rich-iannone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@rich-iannone rich-iannone merged commit b499967 into rstudio:master Oct 19, 2023
@mgirlich mgirlich deleted the performance-render-html branch October 20, 2023 09:33
AaronGullickson added a commit to AaronGullickson/gt that referenced this pull request Feb 26, 2024
This was added in PR rstudio#1470 but used px instead of pt causing the problem to persist
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants