Skip to content

Add the function rows_add() #1323

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 54 commits into from
May 25, 2023
Merged

Add the function rows_add() #1323

merged 54 commits into from
May 25, 2023

Conversation

rich-iannone
Copy link
Member

This PR is focused on the addition of a new function called rows_add(). It allows the user to supply the new row data through name value pairs. The new rows are added to the bottom of the table by default but can be added internally though by using either the .before or .after arguments. If entirely empty rows need to be added, the .n_empty option provides a means to specify the number of blank (i.e., all NA) rows to be inserted into the table.

Here is a basic example, where a new row is added to the bottom of the gt table.

exibble |>
  gt(rowname_col = "row") |>
  rows_add(
    row = "row_9",
    num = 9.999E7,
    char = "ilama",
    fctr = "nine",
    group = "grp_b"
  )

man_rows_add_1

If you wanted to place a row somewhere in the middle of the table, we can use either of the .before or .after arguments in rows_add():

exibble |>
  gt(rowname_col = "row") |>
  rows_add(
    row = "row_4.5",
    num = 9.923E3,
    char = "elderberry",
    fctr = "eighty",
    group = "grp_a",
    .after = "row_4"
  )

man_rows_add_2

Another application is starting from nothing (really just the definition of columns) and building up a table using nothing but rows_add(). This might be useful in interactive or programmatic applications. Here's an example where two columns are defined with dplyr's tibble() function but no rows are present initially; with two calls of rows_add(), two separate rows are added.

dplyr::tibble(
  time = lubridate::POSIXct(),
  event = character(0)
) |>
  gt() |>
  rows_add(
    time = lubridate::ymd_hms("2022-01-23 12:36:10"),
    event = "start"
  ) |>
  rows_add(
    time = lubridate::ymd_hms("2022-01-23 13:41:26"),
    event = "completed"
  )

man_rows_add_5

Fixes: #698

@rich-iannone rich-iannone requested a review from cderv May 9, 2023 13:54
Copy link
Collaborator

@cderv cderv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need a bit more context about this new feature. First let me share my understanding

  • I see gt as a visualization tool (like ggplot) but for tables. So I would expect to find in gt only some features and functions related to how to represent tables, and tweaks styling / layout.
  • Tables data is for me something that should be passed to gt and prepare upstream in the pipeline.

So why reimplement in gt a function to add some rows ? There is already one in tibble::add_row(), even reexported as dplyr::add_row() , and it seems you don't use it in the function either. Is this on purpose ? Hard to program with maybe ?

I understand the why for the original issue "add spacer rows" as I see that as layout issue, but isn't it too much to maintain to make your own add rows function in gt ?

Maybe I missing the point, and there is some specific use case you have in mind. I just prefer to have the full context before doing a more thorough review.

@rich-iannone
Copy link
Member Author

rich-iannone commented May 16, 2023

I think I need a bit more context about this new feature. First let me share my understanding

  • I see gt as a visualization tool (like ggplot) but for tables. So I would expect to find in gt only some features and functions related to how to represent tables, and tweaks styling / layout.
  • Tables data is for me something that should be passed to gt and prepare upstream in the pipeline.

So why reimplement in gt a function to add some rows ? There is already one in tibble::add_row(), even reexported as dplyr::add_row() , and it seems you don't use it in the function either. Is this on purpose ? Hard to program with maybe ?

I understand the why for the original issue "add spacer rows" as I see that as layout issue, but isn't it too much to maintain to make your own add rows function in gt ?

Maybe I missing the point, and there is some specific use case you have in mind. I just prefer to have the full context before doing a more thorough review.

Absolutely, this is worth some explanation. When we originally planned the API for gt (in a Google doc, some time back in early 2018), the thinking was all the data was going to come into gt and be immutable. The only thing left for gt to do was to lightly format values, style parts of the table, and ensure that different outputs were supported. However, it was soon apparent that people wanted to do things a bit outside of the scope of that. We soon had the ability to move columns around, arrange rows within different groups, and add summary rows.

These were all things that could be done in dplyr! It felt to some that this was a design mistake and there should be a clear line between data manipulation and data presentation. But, after some time, and quite a lot of experimentation, we found that doing these data manipulation things in gt felt pretty good and natural. It's hard to make a table summary and add it to an existing table in dplyr (we tried it, not fun!). It turned out that producing that data in gt was not such a bad thing. And later, with extra development, we made it even better by combining formatting with the creation of summary rows.

Turns out that the community pushed us in these weird directions. It was uncomfortable at first, but it's what they really wanted (and they're using these features a lot it seems). A lot of this can feel as though the authorship is less focused but we see some of the same design decisions also in ggplot (like fitting a trendline through a scatterplot; that data was manufactured inside the API).

It's much the same with the rows_add() feature. It breaks with the idea that data coming in shouldn't change much. But people have different ideas. Early on, some people commented that gt should use dplyr idioms and mutate the table readily. Some have asked for a mutate() function in gt. Some people initially didn't like that gt should get a summary_rows() function. I guess the point is, there's going to be a lot of different requests and you have to weigh which of those are reasonable (and have a future) and which just don't make sense. I'm starting to come around to the idea of a rows_add() function (I guess we're past that point, this is the PR for it!) but two months ago I would have said 'forget about it'. Now, however, I think this function has a good future. The maintenance won't be too difficult, it opens up the possibility for interesting table mutations, and we can extend the function later with interesting gt-specific features that many will like a lot.

Copy link
Collaborator

@cderv cderv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think this is fine to merge now. I think you know what you do here - I am fairly new on the project.

It seems dplyr::bind_rows() is more useful to you than tibble::add_row() or `dplyr::add_row() .

All good!

@rich-iannone
Copy link
Member Author

Thanks Christoph! It's good to have a second pair of eyes and I appreciate you taking the time to take a look at this.

@rich-iannone rich-iannone merged commit 2826a47 into master May 25, 2023
@rich-iannone rich-iannone deleted the add-rows branch May 25, 2023 05:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Spacer Rows
2 participants