Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rows_patch fails when y has key values that don't occur in x #5984

Closed
melissagwolf opened this issue Aug 19, 2021 · 3 comments · Fixed by #6203
Closed

rows_patch fails when y has key values that don't occur in x #5984

melissagwolf opened this issue Aug 19, 2021 · 3 comments · Fixed by #6203
Labels
feature a feature request or enhancement tables 🧮 joins and set operations

Comments

@melissagwolf
Copy link

melissagwolf commented Aug 19, 2021

I tried to use row_patch to fill in missing values in a data frame. It failed by design, because it didn't meet the requirement for the function: "key values in y must occur in x". However, I was able to successfully execute the row patch by using the match function from base R (see below).

This isn't a typical bug report because it is clearly stated that this is a limitation of the function, but I wanted to point out that it can be done using a different function from base R. Since row_patch is currently an experimental function in dplyr, I wanted to provide feedback to hopefully improve it. row_patch is much cleaner and I would prefer to use it if possible.

It would also be nice if row_patch could preserve other variables in y that don't occur in x. match can handle this as well.

D1 <- data.frame(
  id=seq(1,3),
  x=c("cow",NA,"sheep"))

D2 <- data.frame(
  id=seq(1,4),
  x=c("cow","turtle","parrot","frog"))

#rows_patch fails
D1 %>%
  rows_patch(D2, by = "id")

#match works
na <- is.na(D1$x)

D1$x[na] <- D2$x[match(D1$id[na],D2$id)]
@hadley
Copy link
Member

hadley commented Sep 16, 2021

Reprex:

library(dplyr, warn.conflicts = FALSE)
D1 <- data.frame(
  id = seq(1, 3),
  x = c("cow", NA, "sheep")
)

D2 <- data.frame(
  id = seq(1, 4),
  x = c("cow", "turtle", "parrot", "frog")
)

# rows_patch fails
D1 %>%
  rows_patch(D2, by = "id")
#> Error: Attempting to patch missing rows.

# match works
na <- is.na(D1$x)

D1$x[na] <- D2$x[match(D1$id[na], D2$id)]

Created on 2021-09-16 by the reprex package (v2.0.0)

@hadley hadley added feature a feature request or enhancement tables 🧮 joins and set operations labels Sep 16, 2021
@hadley
Copy link
Member

hadley commented Sep 16, 2021

Might be worthwhile to have @DavisVaughan think about these functions once he's done with his current work on joins.

@DavisVaughan
Copy link
Member

Probably would be fixed by #5588 if this comment is still applicable in that PR #5588 (comment)

I think is makes sense that keys in y that do not exist in x should always be silently ignored when updating/deleting/patching (no changes needed - we do this in this PR).

And @mgirlich followed up that it might be useful to make that an optional check, rather than just silently ignoring them #5588 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement tables 🧮 joins and set operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants