Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copy shared columns before setkey #3768

Merged
merged 10 commits into from
Aug 27, 2019
Merged

copy shared columns before setkey #3768

merged 10 commits into from
Aug 27, 2019

Conversation

MichaelChirico
Copy link
Member

@MichaelChirico MichaelChirico commented Aug 15, 2019

Closes #3496
Closes #3766

See also: follow-up @ #3791

@codecov
Copy link

codecov bot commented Aug 15, 2019

Codecov Report

Merging #3768 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3768      +/-   ##
==========================================
+ Coverage   99.41%   99.41%   +<.01%     
==========================================
  Files          71       71              
  Lines       13197    13224      +27     
==========================================
+ Hits        13120    13147      +27     
  Misses         77       77
Impacted Files Coverage Δ
R/setkey.R 100% <100%> (ø) ⬆️
src/utils.c 100% <100%> (ø) ⬆️
R/data.table.R 100% <100%> (ø) ⬆️
src/reorder.c 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 81af9b6...f7031ba. Read the comment docs.

@MichaelChirico
Copy link
Member Author

Wasn't sure about the changes to data.table.R, but incidentally, they fix #2245. So I'll leave those changes & add a test to close that.

@MichaelChirico MichaelChirico changed the title Closes #3496 and #3766 -- detect & block setkey from double-reordering identical columns Closes #3496, #3766, and #2245 -- detect & block setkey from double-reordering identical columns Aug 18, 2019
Copy link
Member

@jangorecki jangorecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so maybe we can try revert changes to data.table.R and see if it will still do what is needed?

@mattdowle mattdowle added this to the 1.12.4 milestone Aug 26, 2019
…ause. Faster too by avoiding address() at R level which creates character strings for addresses and avoiding hash table of duplicated(); only noticeable for very many columns.
@mattdowle mattdowle changed the title Closes #3496, #3766, and #2245 -- detect & block setkey from double-reordering identical columns copy shared columns before setkey Aug 27, 2019
@mattdowle
Copy link
Member

Great fix. I just followed up by creating copySharedColumns at C level. Should be faster when there are many columns by avoiding creating a character vector of each column's address to be passed to duplicate() which then creates a hash table on the character strings. We can use this copySharedColumns at other places at C level too; e.g. in :=. Since I guess there's also a problem with using := on a shared memory column (both columns get updated)?
As you wrote in the heading comment, it should be fine to return shared memory columns from j. So yes the data.table.R changes in this PR (other than the part which fixes .SD lock) could be reverted as you said.

@@ -173,3 +173,39 @@ bool Rinherits(SEXP x, SEXP char_) {
return ans;
}

void copySharedColumns(SEXP x) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice!

@MichaelChirico
Copy link
Member Author

OK. Would have to sit back down with this to figure which part exactly is needed to solve #2245...

@MichaelChirico
Copy link
Member Author

actually I think what makes sense is to remove the stuff in data.table.R to it's own PR. the issues are really separate now & deserve separate PRs

@renkun-ken
Copy link
Member

I notice that I raised #2162 long time ago which is a similar but different case. I guess it's impossible to also address that at the moment?

@MichaelChirico
Copy link
Member Author

Thanks Kun. Your issue in fact looks the same and should be solved by this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect setkey result error using function setkey
4 participants