-
-
Notifications
You must be signed in to change notification settings - Fork 86
Style Guide
We have our own "mlr-style" which can be automatically applied to code via the styler package.
Follow these steps to format your code:
-
Install {styler} (
remotes::install_github("mlr-org/styler.mlr")
) -
Apply the style either
2.1 to the whole package:
styler.mlr::style_pkg()
2.2 to a specific file:
styler.mlr::style_file(<file>)
2.3 use the RStudio addin to style the "active file"
When using 2.3, make sure you've set the following option in .Rprofile
:
options(styler.addins_style_transformer = "styler.mlr::mlr_style()")
You can make this more dynamic with the following
if (grepl("mlr", getwd()) || grepl("paradox", getwd())) {
options(styler.addins_style_transformer = "styler.mlr::mlr_style()")
}
This setting uses the default tidyverse_style
for all projects expect the ones that inherit "mlr" in the name.
Styling can be automated via pre-commit
which is a library that triggers useful pre-commit hooks.
To use pre-commit
, do the following:
- Install the package via
install.packages("precommit")
precommit::use_precommit()
- Adjust the created
pre-commit-config.yml
file to your needs. Especially changing the setting which does the styling is important - otherwise you'll style with the tidyverse style. You can also c/p one our existing configs, see here for an example. - Every one in a while you should update the hooks. To do so, call
precommit::autoupdate()
.
We mainly use Hadley's Advanced R Style Guide with slight modifications and comments.
Code and documentation is always written in English, never in German, French or whatever. The same holds for file and directory names.
Put every statement / command in its own line. Do not put a semicolon at the end of a statement. This is R not C.
Bad:
x = 1;
x = 1; y = 2; z = 3
Name functions, variables and arguments in lowercase with a separating underscore, so my_arg = 1; do_that(my_arg)
. But R6 class names are in camelcase, e.g. MyNiceClass
.
Use =
instead of <-
for assignments.
Use a single #
(not two ##
), then one space, on the same level of indentation as
the code you comment, to start a comment line.
Usually, you should not put a comment on the same line as the code you comment.
Combine meaningful identifier names and well written code, which is as self-documenting as possible,
with short, precise lines of comments. Complicated stuff needs lengthier comments.
No or too few comments are bad, but too verbose or unnecessary comments are also (less) bad.
Usually, it is good style to prefix smaller "blocks of code", e.g., half a page of a for
loop,
where you "do a certain thing" with 1-2 comment lines that explain what is going to happen now.
Define strings with double quotes, so "hello"
instead of 'hello'
.
Because 1
actually means 1.0
, a numeric, in R.
One noteable exception is the sequence constructor :
which always creates integers.
1:3
is already a vector.
Put a single space in between if
, while
, repeat
and its following, opening parenthesis (
.
Do not write if (ok == TRUE)
or if (ok == FALSE)
if ok
already is a boolean value, write if (ok)
or if (!ok)
, respectively.
If the body of the statement consists of only one line, the language allows us to omit the curly braces.
This can be a good thing if it keeps the code together (less scrolling is better reading and understanding), but you should only use this when the code structure is very simple, not with, e.g., complicated, nested if
statements.
If you use it, always put the single line on a separate line and indent it.
If in doubt, always use the braces.
If you use curly braces with else
, the curly brace before the else
goes on the same line as the else
.
Good:
if (condition) {
...
}
if (condition) {
...
} else {
...
}
for (i in 1:10) {
...
}
while (not.done) {
...
}
# in rare cases OK
if (condition)
x = 1
Bad:
if(condition) {
...
}
if (condition) {
...
}
else {
...
}
Try to explicitly use the return
statement and do not rely on R's convention to return the value
of the last evaluated expression of the called function, especially if your function is
longer and you return
in multiple places.
Deviate if your function is shorter, e.g., for short anonymous functions.
The basic distinction is whether you have used an imperative or a functional coding style
for the respective function. R allows both and I mix both styles heavily.
If your function is more like a procedure, i.e., it has no meaningful return value,
return invisible(NULL)
.
Do not put arbitrary empty lines in your code, but instead use them sparsely to structure your code into "blocks of actions" that make sense. Usually, you want to put at least a short comment line before such a block that explains its contents, see next point. This structuring guides the reader and allows him to catch his breath.
Try to put one single function definition into one .R file. Name the file like the function. If you have some very short helper functions you can deviate from this.
Good functions very often cover 1 to 3 screen pages. Of course, some complicated stuff sometimes is longer. If that happens, think about introducing another level of indirection, e.g., more functions or data types. Maybe this is a good time for refactoring? If your function or source file covers 5000 lines of code (Have seen those. Not just once.) you are doing it wrong - and your code will not be maintainable.
Can be OK, if the inner function is only used in this context and pretty simple. Otherwise try to avoid.
If you discover something bad or suspicious, and you really don't have much time and it's a very local thing, comment the problem and add # FIXME:
.
Be precise in the description and err on the side of verbosity, otherwise other people
(possibly including yourself) will not understand what you meant when they read this in the future.
If you use a proper editor, it will help you searching through these issues later.
In many cases it is a lot better to open an issue instead!
If you import API from a foreign package, no not refer to it all of the time with ::
.
Use this only for suggested packages (then it's required) or in case of name-clashes.
Bad (in an extension package importing mlr3
):
mlr3::train()
mlr3::predict()
mlr3::resample()
Good (in an extension package importing mlr3
):
train()
predict()
resample()
Intelligent and experienced people stick to their style definition 99% of the time and are able to recognize the 1% of cases where deviations are not only OK, but better. In case of doubt, stick to the law.
CI
Roxygen (Documentation)
Style
Misc