Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally roxygenize (or execute custom R code) before build #397

Closed
krlmlr opened this issue Dec 17, 2013 · 16 comments
Closed

Optionally roxygenize (or execute custom R code) before build #397

krlmlr opened this issue Dec 17, 2013 · 16 comments

Comments

@krlmlr
Copy link
Member

krlmlr commented Dec 17, 2013

As requested in #43 (comment).

In order to avoid having to manage .Rd files at GitHub, at lest the install_github() routine (probably also build()) should support roxygenization before build. Execution of custom R code is a plus, but not necessary for this feature.

To start, I'd suggest supporting a field Devtools: list(document = TRUE) in DESCRIPTION. During build this field is examined, and devtools::document('.') is executed after unpacking but before the call to R CMD build. As devtools already suggests roxygen2, no immediate action in terms of dependency management is needed.

Build-time dependencies could be supported by a new field BuildDepends: in DESCRIPTION, with the same syntax as Depends:. For the given use case this is probably overkill, but it could be useful for specifying generic pre-build code by e.g. Devtools: list(pre_build = Rd2roxygen::rab('.')).

The document parameter to check could be made obsolete.

@hadley
Copy link
Member

hadley commented Dec 20, 2013

I think it's important to have a BuildDepends field because that can be used to check that you have (e.g.) the right version of roxygen2.

@krlmlr
Copy link
Member Author

krlmlr commented Dec 26, 2013

Then there might be no need to support the simple document = TRUE syntax.

@luckyrandom
Copy link

I think the course for all the trouble is that one repository on github is used for both source managing and package distributing. The Rd files should never be tracked by git in the source repository; on the other hand, the Rd files must be tracked and shared through github in the package repository.

Conceptually, we have something like,

       prebuild, such as generate Rd
  Src ====================================> Package

I want to call the action from src to package as build, but build means something different in R, so let's call it prebuild here.

One solution is to get two repository, one for source files, such as RPackageFoo-dev, and one for package distributing, such as RPackageFoo. Then, with some script, such as a git hook, we can update the package distributing repository automatically. Under such setting, everyone can install package from the package repository without any problem, and the developers, who use the source repository, need to handle the prebuilding process by themselves. The cost here is that we have to handle two repository, and some users and develops may not notice the pkg has two repositories.

Another solution is embedding prebuilding into devtools, and let the users handle prebuilding by themselves. Hopefully, it's painless for both developers and users, with the help of devtools. To make it really works as expected, the prebuilding procedure should easy enough to be handled by any machine, including Windows. If devtools have to handle the building dependence and the users may have to install a bunch of packages or even different versions of one packate, then it is too complex to push the job to the users, and the developers should do the heavy lifting, if possible.

Or, we can keep using one repository for both src managing and pkg distributing, and build some script and configure to deal with "compiled" files in git easily. I find the article Dealing With Compiled Files in Git useful. The basic setups are,

  • set .gitattributes so git diff would skip all the compiled files

  • set git merge driver so git merge always use local version of compiled files

  • set git hook so the Rd files are updated before pushing to github

    The potential pitfall is that git merge driver and git hook can not be synced by git and must be set manually by every developers. The good news is, I guess, improper setting by other developer will not mess your repository. Another trouble is that, you may not be able to handle pull request on github, as it doesn't support merge drive. Actually, the only way to make sure Rd files are updated properly, is to precompile locally and push to github. I don't think I'm foresee all possible issues, as it is somehow too complex for me.

@krlmlr
Copy link
Member Author

krlmlr commented Jan 24, 2014

The beauty of install_github is that you can install any ref or even pull request. Any mirroring technique will have to provide this to be of equivalent use.

I agree that pre-building in devtools has to be painless for the user.

The article about storing compiled files is a nice solution, but the setup is slightly complicated. Not all updates of .Rd files cause a merge conflict. If a small tool would take care about those details -- great!

@krlmlr
Copy link
Member Author

krlmlr commented Feb 10, 2014

@hadley: Can we agree on making the BuildDepends dependency optional for build and install but mandatory for check and release? I think otherwise the requirement that "pre-building has to be painless for the user" cannot be satisfied easily. It's "only" documentation, if the user needs it he can take appropriate action to ensure it is installed; if not there's no need to abort installation because a dependency is missing.

@luckyrandom
Copy link

I was playing with the idea of automating build and deploy tool for travis-ci. The project repository is available at r-deploy-git. It's at early stage, and seems to work well my dummy r package. At least, it shows it's doable. Hopefully, a reliable script will be available in the near future, with everyone's help.

For devtools, I think it would be a good idea to have a way to distinguish a "souce" package and "prebuild" package. For a "source" package, prebuild must be called to generate files, such as Rd files, and the developers of pkg can use any tools as they want, including make, as we assume anyone installing "src" package as developers or power users. For a "prebuilt" package, it must follow the CRAN standard, and can be installed with R CMD INSTALL.

@wch
Copy link
Member

wch commented Apr 23, 2014

One problem with running document() as part of the install_github() process is that the resulting .Rd files are not fully determined by the contents of the package source. They can depend on external packages, for example:

  • If you use @inheritParams from something from a different package, the version of that package matters. Sometimes documentation changes between versions. (This is an issue I've encountered in the past.)
  • The output .Rd files depend on the version of roxygen2.

On one hand, you might think this is good, because these changes to external packages don't result in "extra" commits to your package's source code. But on the other hand, there's a big problem: if you are releasing a package to CRAN, there's no commit that corresponds to the exact contents of what's built and sent to CRAN, because what's sent depends on both the source code and the R ecosystem on your computer.

@krlmlr
Copy link
Member Author

krlmlr commented Apr 23, 2014

You could still maintain a branch of "released" versions that does contain the roxygenized files:

A===B===C===D===E===F===G===...
 \                   \
  R1==================R2=========...

Here, A till G represent the development branch, and R1 and R2 are releases; the .Rd files are contained in the release branch.

The other issues you mentioned could be mitigated by a packrat-like approach. (Really using packrat would require rstudio/packrat#31.) While this seems overkill just for the task at hand, this also allows "controlled upgrading" of dependencies -- see the comments in the linked issue.

@wch
Copy link
Member

wch commented Jul 14, 2014

This issue and #523 are starting to make me think that maybe we should offer this as an option...

I've also realized that another possible end-user issue is that, when re-documenting, the entries in NAMESPACE can depend on which packages are currently loaded. This is something I've seen with S3 methods and generics.

@gaborcsardi
Copy link
Member

I think this could be handled by including some R code in the github tree, that is run by devtools::install_github (or ratherdevtools::install in general).

There are a couple of questions about the how:

  • Where to put the R code. This is probably simple, and it could just go into a file that is ignored by build(), e.g. build_package.R.

  • How to avoid the problem @wch mentioned. This could be done by fixing the versions of the builder package, e.g. adding a BuildDependencies field to DESCRIPTION, that can specify the exact version of the package you want to use for building. Examples:

    BuildDependencies: roxygen2 (= 4.0.1), Rcpp (>= 0.11)
    

    or even this to depend on an exact version:

      BuildDependencies: http://github.com/hadley/roxygen/7759478a86f803c5222c19189a022134de251ccb
    

    This would also solve the problem we were having sometimes when various packages use different versions of roxygen2, and then various developers also have various versions.

    I just read over this thread, and realized that @hadley was suggesting essentially the same at the very beginning......

@tbates
Copy link

tbates commented Sep 9, 2014

Glad if people have the time to implement this, I can see the use and I get that people have religion about tracking built files...

But for most users, building Rd files locally will be a source of many intractable errors. That will then lead to them bugging package authors and Hadley... Might be wise to prominently recommend storing the pre-built .Rd files if non-experts are using your package?

@krlmlr
Copy link
Member Author

krlmlr commented Sep 9, 2014

This is similar with packages that need compilation: On Windows, you need Rtools. I think, non-expert users shouldn't have to be using install_github in the first place, but this requires some infrastructure, e.g., as outlined in rpkg/rep.

@tbates
Copy link

tbates commented Sep 10, 2014

amen to avoiding Rtools for windows users. "CRAN 2.0" sounds good, although the severe checking they impose does give a boost to confidence about code integrity.

@hadley
Copy link
Member

hadley commented Apr 21, 2015

I think this is best handled elsewhere. Currently install_github() is a very common way to install packages and having to run additional code before installation is going to add a lot of complexity.

@hadley hadley closed this as completed Apr 21, 2015
@krlmlr
Copy link
Member Author

krlmlr commented Jun 7, 2015

Would you support an argument document = TRUE for build()?

@lock
Copy link

lock bot commented Sep 18, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Sep 18, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants