Skip to content
berndbischl edited this page Nov 29, 2013 · 34 revisions

Version 1.0, written by Bernd Bischl

Every programmer knows that code is read more than it is written. Not having a consistent coding style is error prone and annoying for the reader. It also looks very unprofessional. Of course, such a style convention should have been defined on the language level many, many years ago by R core. Unfortunately, it was not. So now every group or single package developer has his own style - or none at all. What follows is our personal choice. Stick to it if you collaborate with us on our packages.

Good alternatives are available here:

Google: http://google-styleguide.googlecode.com/svn/trunk/Rguide.xml

Hadley: http://adv-r.had.co.nz/Style.html

Henrik: http://www.maths.lth.se/help/R/RCC/

In some cases I have shamelessly stolen from the guides above, and in quite a few places this guide deviates.

A few side remarks: Nobody really enjoy reading such standard definitions and I can easily imagine more enjoyable things than writing them. And while for some of the following rules arguments could (maybe) be made, why they are good rules, others are based on a subjective feeling of aesthetics or simply what I became accustomed to over the years. If people start arguing religious over where to place curly braces, better go away and talk to less weird guys. In summary, I am not proposing that this is the best set of rules that exists. What is important is that you follow SOME style set.

  1. Language: Code and documentation is always written in English, never in German, French or whatever. The same holds for file and directory names.

  2. File names: File names end in .R and are meaningful. Don't make them too short, don't make them too long. No special characters or whitespaces occur in them. Stick to [a-z], [A-Z], [0-9] [_,-] nearly always.

  3. Line length: Maximum line length is somewhere between 80 and 100 characters.

  4. Tabs: Tabs are converted to spaces and tab width is 2. Configure your editor of choice for this.

  5. Indentation: Always indent one level after opening a curly brace and remove one level after closing one. Indentation width is one tab width, which is 2 spaces. When a line break occurs inside parentheses, do not align the wrapped line with the first character inside the parenthesis, but also indent 2 spaces.

    Good:

    for (i in 1:3) {
      y = y + i
    }
    
    doThatThing(arg1 = "a_long_string_is_passed", arg2 = "a_long_string_is_passed_here_too", 
      arg3 = "another_long_string")
    

    Bad:

    for (i in 1:3) {
    y = y + i
    print("foo")
    }
    
    doThatThing(arg1 = "a_long_string_is_passed", arg2 = "a_long_string_is_passed_here_too", 
                arg3 = "another_long_string")
    
  6. Strings: Define strings like "hello" instead like 'hello'.

  7. Assignment operator: Use = instead of <- for assignments. Put one space before and after = in assignments.

    Good:

    a = 3

    Bad:

    a <- 3
    a=3
    a =3
  8. Spacing around operators and commas: Place one space on each side of all binary operators (=, ==, +, -, etc). FIXME: what when passing parameters in a function call? No space before a comma, one space after a comma.

    Good:

    f(a, b)
    
    f(a + b)
    
    f(arg = "value")
    
    if (a == b) {
      ...
    }

    Bad:

    f(a,b)
    
    f(a , b)
    
    f(a+b)
    
    if (a==b) {
      ...
    }
  9. Curly braces: The opening curly brace goes on the same line as the respective syntactic element it belongs to and never on its own line. Always have one space before the opening brace {. The closing curly brace } goes on its own line, except if it occurs before an else statement, see below.

    Good:

    for (i in 1:3) {
      y = y + i
    }

    Bad:

    for (i in 1:3) {y = y + i}
    
    for (i in 1:3){
      y = y + i
    }
    
    for (i in 1:3)
    {
      y = y + i
    }
  10. if, for, while statements: Put a single space after if, while, repeat and before following, opening parenthesis (. Do not write if (ok == TRUE) if ok already is a boolean value. Write if (ok). If the body of the statement consists of only one line, the language allows us to omit the curly braces. This can be a good thing if it keeps the code together (less scrolling = better reading and understanding), but should only use this when the code structure is very simple, not with, e.g., complicated nested if-statements. If you use it, always put the single line body one a separate line and indent it. If in doubt, always use the braces. If an

    Good:

    if (condition) {
      ...
    }
    
    if (condition) {
      ...
    } else {
      ...
    }
    
    # in rare cases OK
    if (condition)
      x = 1
    
    for (i in 1:10) {
      ...
    }
    
    while (not.done) {
      ...
    }

    Bad:

    if(condition) {
      ...
    }
    
    if (condition) {
      ..
    }
    else {
      one or more lines
    }
    
  11. No extra whitespaces for parenthesis: Do not put whitespaces before or after the parenthesis ( and ) when defining or calling functions.
    Good:

    doIt(1, 4)

    Bad:

    doIt( 1, 4 )
    doIt (1, 4)
  12. Return statement: Try to explicitly use the return statement and do not rely on R's convention to return the value of the last evaluated expression of the called function, especially if your function is longer and you "return" in multiple places. Deviate if your function is shorter, e.g., for short anonymous functions. If function is more like a procedure, i.e., it has no meaningful return value, use return invisible(NULL).

  13. One command per line and semicolon: Put every statement / command in its own line. Do not put a semicolon at the end of a statement. This is R not C. In VERY rare cases you can put two very short statements and a single line and separate them with ; like this

    a = 1; b = 2
  14. Comments: Use a single #, on the same level of indentation as the code you comment, to start a comment line. Usually, you should not put a comment on the same line as the code you comment. Combine meaningful identifier names and well written code, which is as self-documenting as possible with short, precise lines of comments. Complicated stuff needs lengthier comments. No or too few comments are bad, but too verbose or unnecessary comments are also (less) bad.

  15. Function names: Functions are named in "verb style", written in camel case and the name begins with a lowercase letter. Names have to be meaningful and are important, hence, invest some time to find good, expressive names. Don't make them too short, don't make them too long. In case of doubt, choose the longer, more expressive name, but don't overdo it.

    Good:

    doThatThing()

    Bad:

    doThatThingYouKnowWhatIMeanItIsReallyCool()
    dtt()
    do.that.thing()
    dothatthing()
    do_that_thing()

    Of course, especially the last "bad" example do_that_thing() is used by other, good R coders and not bad in itself, if used consistently. We just have a different preference.

  16. Variable and function argument names: Use lowercase letters and separate words with a dot. This allows to visually discriminate functions and arguments / variables. (Yes, in R functions are first-class objects. We can live with that and this won't hurt or confuse us in 99.999% of cases. If you did not understand this subtle point, do not worry.) Names have to be meaningful, they are important, hence, invest some time to find good, expressive names. Don't make them too short, don't make them too long. Here is a rule of thump to decide whether a name should be shorter or longer: Is the variable used in various places of a long and complicated function after it was introduced? Make the name longer and very precise. Is the variable used in a local, very restricted context? A shorter name is probably not only OK, but even better.

    Good:

    multiply(a, b)
    

    Bad:

     # name too long, simply "i" would be OK  
     for (the.iterator in 1:10) {
      vec[the.iterator] = 1
    }
    dtt
    do.that.thing
    dothatthing()
    do_that_thing()
  17. Code distribution in files: Try to put one single function definition into one .R file. Name the file like the function.
    If you have some very short helper functions you can deviate from this.

  18. Function length and abstraction: Good functions very often 1 to 3 screen pages. Of course, some complicated tuff sometimes is longer. If that happens, think about introducing another level of indirection, e.g., more functions. Maybe this is a good time for refactoring? If your function or source file covers 5000 lines of code (Have seen those. Not just once.) you are doing it wrong - and your code will not be maintainable.

  19. Local helper functions defined in parent functions: Can be OK, if the inner function is only used in the context and pretty simple. Otherwise try to avoid.

  20. Nearly never use global variables.

  21. Object-oriented programming: Use S3 instead of S4. S4 results in code bloat without real benefits. Yes, S3 sucks too, but less so. No final verdict on reference classes. Where an OO-style of programming makes sense for your R project cannot be answered in general. If you need to define your own abstract data types and respective operations on them, it likely does.

  22. Don't repeat yourself: Copy-paste code is always an indicator that something is wrong.

  23. Exceptions to the rules: Intelligent and experienced people stick to their style definition 99% of the time and are able to recognize the 1% of cases where deviations are not only OK, but better. In case of doubt, stick to the law.

  24. The existing code base always has preference If the style of an already existing project differs from the above, stick to its style.

Clone this wiki locally