Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we somehow avoid copying large directories that will not be in the package? #59

Closed
gaborcsardi opened this issue Nov 5, 2018 · 18 comments · Fixed by #150
Closed
Labels
feature a feature request or enhancement

Comments

@gaborcsardi
Copy link
Member

From r-lib/rcmdcheck#90

@krlmlr
Copy link
Member

krlmlr commented Dec 20, 2018

R CMD INSTALL seems to use cp -LR at some point, which isn't very helpful. It seems easiest to patch it at the base, so that it uses a more fine-grained copy procedure.

@gaborcsardi
Copy link
Member Author

The alternative is that we copy first, only what we need to, and then call R CMD INSTALL on the temp dir. That's a double-copy, of course...

@krlmlr
Copy link
Member

krlmlr commented Dec 29, 2018

A triple-copy, and we don't get to keep the .o files.

Who from R-core would be supportive of a patch?

@jimhester jimhester added the feature a feature request or enhancement label Mar 20, 2019
@sfirke
Copy link

sfirke commented Apr 7, 2020

I found r-lib/rcmdcheck#90, then this, and they revealed that my newfound problem was caused by an enormous revdep/ folder. Just sharing this for a +1 that I got bit by this and struggled to understand it. If simpler than not copying, could a message be printed if there are large folders present that may cause the delay in check/build?

@statnmap
Copy link

I revive this discussion as I also am interested in pkgdown::build() not copying all files, for check() at least. Maybe when we build we can avoid copying files that are in .Rbuildignore
Then, if CRAN does the same, do you send all directories to CRAN ?
I mean maybe devtools::release() could also send files that are not in .Rbuildignore to be consistent.
At least, a common option could allow the developer to ignore files in Rbuildignore in both these functions.

@gaborcsardi
Copy link
Member Author

Then, if CRAN does the same, do you send all directories to CRAN ?
I mean maybe devtools::release() could also send files that are not in .Rbuildignore to be consistent.

I think the files that are in .Rbuildignore are omitted from the built .tar.gz, no?

@krlmlr
Copy link
Member

krlmlr commented Nov 25, 2020

They are still copied by base R with cp -LR: https://github.com/wch/r-source/blob/trunk/src/library/tools/R/build.R#L1000.

@gaborcsardi
Copy link
Member Author

gaborcsardi commented Nov 25, 2020

@krlmlr Sure, but they won't be in the built .tar.gz, because after the copying they'll be deleted. Otherwise what's the point of .Rbuildignore?

@statnmap
Copy link

statnmap commented Nov 25, 2020

Yes and they are all copied when we do rcmdcheck::rcmdcheck() because of the build here, before building the tar.gz
Which is not necessary and increases the check procedure when you have a big "raw-data" directory

@krlmlr
Copy link
Member

krlmlr commented Nov 25, 2020

...or a revdep/ directory or whatnot.

@gaborcsardi
Copy link
Member Author

Unfortunately this is quite cumbersome to fix, because as mentioned above, the bug is really in base R, pkgbuild just calls R CMD build, which copies everything. Then it applies .Rbuildignore in the copy.

So we would need to re-implement R CMD build, or implement our own .Rbuildignore matching. And in the latter case there would be a double copy of course, because R CMD build will still copy our trimmed copy.

@krlmlr
Copy link
Member

krlmlr commented Nov 25, 2020

...or find a sponsor in R core?

@nrsc
Copy link

nrsc commented Jul 23, 2021

Any updates on this? I think this could be an important element to project builds.

@vspinu
Copy link

vspinu commented Jan 11, 2022

I doubt R core will go for it. The patch would involve essentially implementing a version of rsync -exclude and copying file by file. Too much of a change for a thing that worked for decades with no complains.

I think the biggest animal is "revdeps". So maybe revdeps can store its cache somewhere else. Or simply make pkgbuild warn the user if build is taking unreasonably long and/or if there are very big directories in the project.

@hadley
Copy link
Member

hadley commented May 23, 2022

Just a note that this would also allow us to fix the annoying messages like:

─  checking for empty or unneeded directories
   Removed empty directory ‘devtools/tests/testthat/testMarkdownVignettes/inst’
   Removed empty directory ‘devtools/tests/testthat/testReadme/man’
   Removed empty directory ‘devtools/tests/testthat/testReadme’
   Removed empty directory ‘devtools/tests/testthat/testTest/R/_snaps’
   Removed empty directory ‘devtools/tests/testthat/testTest/tests/testthat/_snaps’
   Removed empty directory ‘devtools/tests/testthat/testVignetteExtras/inst’
   Removed empty directory ‘devtools/tests/testthat/testVignettes/inst’

@krlmlr
Copy link
Member

krlmlr commented May 25, 2022

What if we created a directory tree full of symlinks (on non-Windows) and passed that to R CMD build? That would avoid the double copy.

It seems that we need to:

  • create a recursive listing
  • apply .Rbuildignore
  • trim empty directories
  • create all target directories via unique(dirname(.))
  • copy or link file by file

20 lines of code? What corner cases have I missed?

With a clean base-R-only implementation we could prototype here and suggest a patch to R-core (where it really belongs).

@gaborcsardi
Copy link
Member Author

@krlmlr To me this smells like something that would have a lot of corner cases. But if you implement it, then we can make it opt in and experiment with it.

gaborcsardi added a commit that referenced this issue Nov 25, 2022
This is currently an opt-in feature, and you need
to set the `PKG_BUILD_COPY_METHOD=link` environment
variable ot turn it on. It can also be configured
via an option or a `DESCRIPTION` entry. See
`?build` or `README.md`.

Closes #59.
@gaborcsardi
Copy link
Member Author

There is now a PR for this: #150, and it will be in the next release, happening in a couple of days.

It is currently opt-in. It is very cumbersome to implement this, so feedback is appreciated. Please turn it on via setting the PKG_BUILD_COPY_METHOD=link env var in .Renviron and let me know if it fails for any packages. Be especially careful when building packages for CRAN submissions, because if there is a bug in this, then pkgbuild might leave some files out from your CRAN package.

Maybe it would be useful to have a summary of the excluded files on the screen, while building the package?

Btw. empty directories are currently not excluded, because R CMD build has a flag to keep them, but we can certainly implement their early trimming later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants