Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding the value of .I for non-matching rows when using .EACHI #5457

Open
Henrik-P opened this issue Sep 6, 2022 · 3 comments
Open

Comments

@Henrik-P
Copy link

Henrik-P commented Sep 6, 2022

I join two data.tables, use .I in j, and by = .EACHI. When a row in i has no match to x the result is 0. I wish to understand why this is the case.

Some toy data:

d1 = data.table(v = c("A", "B", "C", "A", "C"))

# add column identical (value-wise) to .I
d1[ , i := .I]

d2 = data.table(v = c("D", "A", "G", "C"))

d1
#    v i
# 1: A 1
# 2: B 2
# 3: C 3
# 4: A 4
# 5: C 5

d2
#    v
# 1: D
# 2: A
# 3: G
3 4: C

Join the two tables on 'v'. In j, call either "i" or .I. Use by = .EACHI ("evaluates j for the groups in 'DT' that each row in i joins to").

When j is "i" (which at least "looks the same" as .I), non-matched rows evaluates to NA. To me, this seems consistent with the default nomatch behaviour: "When a row in i has no match to x, nomatch=NA (default) means NA is returned":

d1[d2, on = .(v), i, by = .EACHI]
#    v  i
# 1: D NA # unmatched row in `i` evaluates to NA
# 2: A  1
# 3: A  4
# 4: G NA # unmatched row in `i` evaluates to NA
# 5: C  3
# 6: C  5

On the other hand, when j is .I, non-matched rows evaluates to 0:

d1[d2, on = .(v), .I, by = .EACHI]
#    v I
# 1: D 0 # unmatched row in `i` evaluates to 0
# 2: A 1
# 3: A 4
# 4: G 0 # unmatched row in `i` evaluates to 0
# 5: C 3
# 6: C 5

From ?.I:

While grouping, it holds for each item in the group, its row location in x

However, I fail to find documentation on how unmatched rows in i evaluate to 0 when j = .I. Can someone help me understand this seemingly inconsistent behaviour?


R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Tried on:
data.table_1.14.2 &
data.table 1.14.3 IN DEVELOPMENT built 2022-07-20 18:26:12 UTC

@jangorecki
Copy link
Member

Worth to try on devel as well just to ensure it haven't changed since 1.14.2

@Henrik-P
Copy link
Author

Henrik-P commented Sep 6, 2022

Thanks @jangorecki, I forgot to include that I also attempted with devel version. Updated the post.

@Henrik-P
Copy link
Author

Henrik-P commented Sep 7, 2022

I just found a related open issue: With by=.EACHI and unmatched i, can we set nomatch= to get .SD[0] instead of .SD[NA]?, with a comment on the same result as here:

.I correctly (?) displays 0 for unmatched group

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants