-
Notifications
You must be signed in to change notification settings - Fork 2
R Style Guide
Version 1.0, written by Bernd Bischl
Every programmer knows that code is read more than it is written. Not having a consistent coding style is error prone and annoying for the reader. It also looks very unprofessional. Of course, such a style convention should have been defined on the language level many, many years ago by R core. Unfortunately, it was not. So now every group or single package developer has their own style - or none at all.
Good alternatives are available here:
Google: http://google-styleguide.googlecode.com/svn/trunk/Rguide.xml
Hadley: http://adv-r.had.co.nz/Style.html
Henrik: http://www.maths.lth.se/help/R/RCC/
In some cases I have shamelessly stolen from the guides above, and in quite a few places this guide deviates.
A few side remarks: Nobody really enjoys reading such standard definitions and I can easily imagine more enjoyable things than writing them. And while for some of the following rules arguments could (maybe) be made, why they are good rules, others are based on a subjective feeling of aesthetics or simply what I became accustomed to over the years. If people start arguing religiously over where to place curly braces, better go away and talk to less weird guys. In summary, I am not proposing that this is the best set of rules that exists. What is important is that you follow SOME style set. What follows is our personal choice. Stick to it if you collaborate with us on our packages.
-
Language: Code and documentation is always written in English, never in German, French or whatever. The same holds for file and directory names.
-
File names: File names end in .R and are meaningful. Don't make them too short, don't make them too long. No special characters or whitespaces occur in them. Stick to [a-z], [A-Z], [0-9] [_,-] nearly always.
-
Line length: Maximum line length is somewhere between 80 and 100 characters.
-
Tabs: Tabs are converted to spaces and tab width is 2. Configure your editor of choice for this.
-
Indentation: Always indent one level after opening a curly brace and remove one level after closing one. Indentation width is one tab width, which is 2 spaces. When a line break occurs inside parentheses, do not align the wrapped line with the first character inside the parenthesis, but also indent 2 spaces.
Good:
for (i in 1:3) { y = y + i } doThatThing(arg1 = "a_long_string_is_passed", arg2 = "a_long_string_is_passed_here_too", arg3 = "another_long_string")
Bad:
for (i in 1:3) { y = y + i print("foo") } doThatThing(arg1 = "a_long_string_is_passed", arg2 = "a_long_string_is_passed_here_too", arg3 = "another_long_string")
-
Strings: Define strings like "hello" instead like 'hello'.
-
Assignment operator: Use = instead of <- for assignments.
Good:
a = 3
Bad:
a <- 3
-
Spacing around operators and commas: Place one space on each side of all binary operators (=, ==, +, -, etc). FIXME: what when passing parameters in a function call? No space before a comma, one space after a comma.
Good:
a = 3 f(a, b) f(a + b) f(arg = "value") if (a == b) { ... }
Bad:
a=3 a =3 f(a,b) f(a , b) f(a+b) if (a==b) { ... }
-
Curly braces: The opening curly brace goes on the same line as the respective syntactic element it belongs to and never on its own line. Always have one space before the opening brace {. The closing curly brace } goes on its own line, except if it occurs before an else statement, see below.
Good:
for (i in 1:3) { y = y + i }
Bad:
for (i in 1:3) {y = y + i} for (i in 1:3){ y = y + i } for (i in 1:3) { y = y + i }
-
if, for, while statements: Put a single space in between if, while, repeat and its following, opening parenthesis (. Do not write if (ok == TRUE) if ok already is a boolean value. Write if (ok). If the body of the statement consists of only one line, the language allows us to omit the curly braces. This can be a good thing if it keeps the code together (less scrolling = better reading and understanding), but you should only use this when the code structure is very simple, not with, e.g., complicated, nested if statements. If you use it, always put the single line body on a separate line and indent it. If in doubt, always use the braces. If you use curly braces with else, the curly brace before the else goes on the same line as the else.
Good:
if (condition) { ... } if (condition) { ... } else { ... } # in rare cases OK if (condition) x = 1 for (i in 1:10) { ... } while (not.done) { ... }
Bad:
if(condition) { ... } if (condition) { ... } else { ... }
-
No extra whitespaces for parenthesis: Do not put whitespaces before or after the parenthesis ( and ) when defining or calling functions.
Good:doIt(1, 4)
Bad:
doIt( 1, 4 ) doIt (1, 4)
-
Return statement: Try to explicitly use the return statement and do not rely on R's convention to return the value of the last evaluated expression of the called function, especially if your function is longer and you "return" in multiple places. Deviate if your function is shorter, e.g., for short anonymous functions. If your function is more like a procedure, i.e., it has no meaningful return value, use return invisible(NULL).
Good:
calculateStuff = function(n) { if (n = 0) return(-1) y = 123 return(y) } sapply(1:10,. function(x) x^2) showStuffOnConsole = function() { message("Hello") invisible(NULL) }
Bad:
calculateStuff = function(n) { if (n = 0) -1 y = 123 } sapply(1:10,. function(x) return(x^2))
-
One command per line and semicolon: Put every statement / command in its own line. Do not put a semicolon at the end of a statement. This is R not C. In VERY rare cases you might put two very short statements on the same single line and separate them with ; like this
a = 1; b = 2
-
Comments: Use a single #, on the same level of indentation as the code you comment, to start a comment line. Usually, you should not put a comment on the same line as the code you comment. Combine meaningful identifier names and well written code, which is as self-documenting as possible with short, precise lines of comments. Complicated stuff needs lengthier comments. No or too few comments are bad, but too verbose or unnecessary comments are also (less) bad.
-
Function names: Functions are named in "verb style", written in camel case and the name begins with a lowercase letter. Names have to be meaningful and are important, hence, invest some time to find good, expressive names. Don't make them too short, don't make them too long. In case of doubt, choose the longer, more expressive name, but don't overdo it.
Good:
doThatThing()
Bad:
doThatThingYouKnowWhatIMeanItIsReallyCool() dtt() do.that.thing() dothatthing() do_that_thing()
-
Variable and function argument names: Use lowercase letters and separate words with a dot. This allows to visually discriminate functions from arguments / variables. (Yes, in R functions are first-class objects. We can live with that and this won't hurt or confuse us in 99.999% of cases. If you did not understand this subtle point, do not worry.) Names have to be meaningful, they are important, hence, invest some time to find good, expressive names. Don't make them too short, don't make them too long. Here is a rule of thump to decide whether a name should be shorter or longer: Is the variable used in various places of a long and complicated function after it was introduced? Make the name longer and very precise. Is the variable used in a local, very restricted context? A shorter name is probably not only OK, but even better.
Good:
multiply = function(a, b) writeLinesToFile = function(file.path, lines, show.info = TRUE) { ... } for (i in 1:10) { vec[i] = 1 }
Bad:
writeLinesToFile = function(filePath, lines, showInfo = TRUE) { ... } # name too long, simply "i" would be OK for (the.iterator in 1:10) { vec[the.iterator] = 1 }
-
Documenting and argument checks: Use roxygen2 to document your functions. Educate yourself here, see section "Documentation":
http://adv-r.had.co.nz/#package-development
A function is documented like this
https://github.com/berndbischl/BBmisc/blob/master/R/clipString.R
See how we also document the argument and return types in a formal fashion? Always do that. Yes, R is a dynamically typed language but in more than 90% of my function definitions I have a certain type in mind for my function arguments. This holds true for many R packages, but very often this information is not easily available from the help page.
A scalar integer parameter #' @param n [\code{integer(1)}]\cr #' My argument. #' Default is 1. An integer vector of arbitrary size #' @param x [\code{integer}]\cr #' The vector. An S3 object of a class you defined in the same package. #' @param obj [\code{\link{MyS3Class}}]\cr #' A nice object.
-
Code distribution in files: Try to put one single function definition into one .R file. Name the file like the function.
If you have some very short helper functions you can deviate from this. -
Function length and abstraction: Good functions very often cover 1 to 3 screen pages. Of course, some complicated stuff sometimes is longer. If that happens, think about introducing another level of indirection, e.g., more functions or data types. Maybe this is a good time for refactoring? If your function or source file covers 5000 lines of code (Have seen those. Not just once.) you are doing it wrong - and your code will not be maintainable.
-
Local helper functions defined in a parent function: Can be OK, if the inner function is only used in this context and pretty simple. Otherwise try to avoid.
-
Nearly never use global variables.
-
Object-oriented programming: Use S3 instead of S4. S4 results in code bloat without real benefits. Yes, S3 sucks too, but less so. No final verdict on reference classes. Whether an OO-style of programming makes sense for your R project cannot be answered in general. If you need to define your own abstract data types and respective operations on them, it likely does make sense to use OO.
-
Don't repeat yourself: Copy-paste code is always an indicator that something is wrong.
-
Exceptions to the rules: Intelligent and experienced people stick to their style definition 99% of the time and are able to recognize the 1% of cases where deviations are not only OK, but better. In case of doubt, stick to the law.
-
The existing code base always has preference If the style of an already existing project differs from the above, stick to its style.