Why?

November 23, 2010

R Style Guide

Filed under: R — Tags: , , — csgillespie @ 3:51 pm

Each year I have the pleasure (actually it’s quite fun) of teaching R programming to first year mathematics and statistics students. The vast majority of these students have no experience of programming, yet think they are good with computers because they use facebook!

Debugging students' R scripts

The class has around 100 students, and there are eight practicals. In some of  these practicals  the students have to submit code. Although the code is “marked” by a script, this only detects if the code is correct. Therefore, I have to go through a lot of R functions by hand and find bugs.

  • First year the course ran, I had no style guide.
    • Result: spaghetti R code.
  • Second year: asked the students to indent their code.
    • In fact, during practicals I refused to debug in any R code that hadn’t been indented.
    • Result: nicer looking code and more correct code.
  • This year I intend to introduce a R style guide based loosely on Google’s and Hadley’s guides.
    • One point that’s in my guide and not (and shouldn’t be) in the above style guides, is that all functions must have one and only return statement. I tend to follow the single return rule for the majority of my R functions, but do, on occasions, break it. The bible of code styling, Code Complete, recommends that you use returns judiciously.

R Style Guide

This style guide is intended to be very light touch. It’s intended to give students the basis of good programming style, not be a guide for submitting to cran.

File names

File names should end in .R and, of course, be meaningful. Files should be stored in a meaningful directory – not your Desktop!

GOOD: predict_ad_revenue.R
BAD: foo.R

Variable & Function Names

Variable names should be lowercase. Use _ to separate words within a name. Strive for concise but meaningful names (this is not easy!)

GOOD: no_of_rolls
BAD: noOfRolls, free

Function names have initial capital letters and are written in CamelCase

GOOD: CalculateAvgClicks
BAD: calculate_avg_clicks , calculateAvgClicks

If possible, make function names verbs.

Curly Braces

An opening curly brace should never go on its own line; a closing curly brace should always go on its own line.

GOOD:
if (x == 5) {
  y = 10
}
RtnX = function(x) {
  return(x)
}
BAD:
RtnX = function(x)
{
  return(x)
}

Functions

Functions must have a single return function just before the final brace

GOOD:
IsNegative = function(x){
  if (x < 0) {
    is_neg = TRUE
  } else {
    is_neg = FALSE
  }
  return(is_neg)
}
BAD:
IsNegative = function(x) {
  if (x < 0){
    return(TRUE)
  } else {
    return(FALSE)
  }
}

Of course, the above function could and should be simplified to
is_neg = (x < 0)

Commenting guidelines

Comment your code. Entire commented lines should begin with # and one space.  Comments should explain the why, not the what.

What’s missing

I decided against putting a section in on “spacing” , i.e. place spaces around all binary operators (=, +, -, etc.). I think spacing may be taking style a bit too far for a first year course.

Comments welcome!

About these ads

24 Comments »

  1. Maybe the GOOD RtnX function should return y instead of x.

    Comment by Conor — November 23, 2010 @ 4:06 pm

    • Thanks. After looking at that section I decided to change the functions entirely. Naming functions is really tricky!

      Comment by csgillespie — November 23, 2010 @ 4:16 pm

  2. I’m pretty much an anarchist, yet still I think encouraging spaces is a good idea — both for readability and because of:

    x<2
    versus
    x<-2

    Comment by Pat Burns — November 23, 2010 @ 6:18 pm

    • Yes, I think you’re correct. In my original guide I had a section on spaces, then deleted it. I think I’ll have an “optional” section add the end of guide contain examples on spacing.

      Comment by csgillespie — November 23, 2010 @ 10:50 pm

  3. Coding style is obviously a personal choice, but I would suggest that “no_of_rolls” is a BAD choice.

    Using the underscores is legal, but goes against tradition in the S/R language.

    For example, function names in R are already an inconsistent mess:

    row.names, rownames
    browseURL, contrib.url, fixup.package.URLs
    package.contents, packageStatus
    mahalanobis, TukeyHSD
    getMethod, getS3method

    No need to make things even worse by introducing underscores.

    Comment by Kevin Wright — November 23, 2010 @ 7:08 pm

    • I realise that using underscores is generally frowned on. However, there are some functions in R base that use “_”. Just run grep("^[^\\.]*$", apropos("_"), value = T) Also, in the other languages I program in (python and C), it’s encouraged to use underscores.

      As you’ve mentioned, the R naming system is almost non-existent, so I think that the most important thing is consistency in your own functions.

      Comment by csgillespie — November 23, 2010 @ 11:01 pm

      • I’ve another reason why underscores are a bad choice for R: ess the emacs speaks statistics mode turns every underscore into a <- which is very practical. Functions and variables with underscores in their name, however, become a nightmare to type out since you have to press underscore twice to get one. I personally use points for variables (hough not for functions (this might conflict with generic functions in functions) where I use some very inconsisten camel case rules.

        Comment by float — November 25, 2010 @ 12:00 pm

      • I realised that ESS was an argument that was sometimes used against “_”, but I just found the most bizarre argument I’ve ever come across. ESS shouldn’t lead R style, it should be the other way about.

        Besides, you can just turn off this behaviour in emacs by adding (ess-toggle-underscore nil) to your .emacs file.

        I suppose I just find it frustrating that each language has an opposing style guide. In python and C, “_” is recommended for variable names, in R it’s frowned on. I just want to have a consistent styling (where possible) across different languages.

        For example, some of the students will do FORTRAN 90 in second year. The above style is still a useful template for that course.

        Comment by csgillespie — November 25, 2010 @ 2:31 pm

      • ESS is a hen egg problem (which convention lead to which convention in the first place). Nevertheless, I’m not convinced that the way people use a software (e.g. ESS) shouldn’t define the style we write. The question is, is it just an annoyance to me or is ESS actually used by other people. (I sincerely don’t know, at my institute I’m the only one).
        I commend your attempt to provide style guides that are useful for many languages. But be aware that demanding style for R because FORTRAN90 is programmed like that might seem equally bizarre. (A friend of mine puts semicolons on the end of each line of his R code)

        Comment by float — November 25, 2010 @ 3:58 pm

  4. [...] read a funny but much to the point blog entry on the difficulties of teaching proper programming skills to first year students! I will certainly [...]

    Pingback by The joys of teaching R « Xi'an's Og — November 23, 2010 @ 8:07 pm

  5. I never understood the objection to having the left brace on a new line. I find it MUCH easier to be certain my braces match and my indentation is consistent that way, especially in C and C++ where loops and tests are frequently nested several levels deep. In such cases I often put a comment following key closing braces. This practice has saved my bacon many times in debugging. Admittedly, I don’t tend to have as many nested loops in R. The key is to pick a style and be consistent.

    Comment by John Minter — November 23, 2010 @ 11:10 pm

    • I don’t really have a strong opinion on the whole braces thing. I just choose this format, since both Google and Hadley use it. In the last year, I’ve started to use emacs as my text editor. Emacs just takes care of code indenting.

      Comment by csgillespie — November 23, 2010 @ 11:17 pm

  6. (1) Use some sort of distinctive suffix for data frames, e.g. crime.data. Ditto for fitted models, e.g. burglary.location.model.

    (2) Use consistent suffixes for related or derived values, so in statistics the variable x has sample statistics x.mean, x.n, x.variance, etc.

    (3) ALLCAPS variable names are distinctive; I’m torn between using them for manifest constants or matrices.

    Comment by therandomtexan — November 24, 2010 @ 12:29 am

    • I agree with your points (although I really don’t like using “.” in variable names). However, I think for the guide to work with 100+ students your points may be a step too far. Just getting students to indent code last year was a struggle!

      I may include points 1 & 2 in an optional section at the bottom of the guide. In the course, manifest constants and matrices aren’t really used, so I definitely won’t mention the ALLCAPS part.

      Comment by csgillespie — November 24, 2010 @ 1:13 pm

      • It is worth making the point about good style, even beyond braces, to students. I think you’re right that you don’t want to enforce all the points in your class, but letting them see a good fairly complete style guide will get them thinking about these things. Some will get a lot out of that, and you can make that “at least get indentation right” point the “the enforceable condition” if you will.

        Comment by Clark — December 2, 2010 @ 1:52 pm

      • Yes I agree. I’ll put probably put a slightly expanded version in the notes with “optional” sections.

        I have to admit that I was expecting a few responses along the lines of “here’s the style guide I use in my class”, but I didn’t get any. It would be interesting to see how other people teach R.

        Comment by csgillespie — December 2, 2010 @ 7:27 pm

  7. Coding guidelines is a complex topic and it is common to see constructs recommended against because they have some undesirable characteristic without considering whether the alternative constructs have characteristics that are even worse (e.g., your one return rule).

    For lots of detailed discussion see the coding guidelines sections of the book (a free download; ignore the C specific bits) The New C Standard: An Economic and Cultural Commentary.

    More than you probably want to know about identifier naming.

    Comment by Derek Jones — November 24, 2010 @ 2:18 pm

    • Thanks for links. I agree that the “single return rule” isn’t a rule that should always be followed and to be honest if I taught a follow-on course I would relax it. However, the vast majority of these students will never program in real world and so I think introducing some structure at this early stage is a good thing. These students are training to be statisticians/mathematicians not computer scientists.

      Also, in a computer practical I have to look at lots of code and be able to debug it very quickly. Last year, I was looking at programs which had numerous return‘s, so debugging was a nightmare for me and the student.

      Usually in the class there are a couple of very good students who can already program. I would allow these students much more flexibility.

      Comment by csgillespie — November 24, 2010 @ 3:44 pm

      • I would query whether a “single return rule” should ever be followed, but who am I to argue against decades of folklore.

        You could at least change the example so that it does not suggest that writing is_neg = (x < 0) is the wrong thing to do.

        Comment by Derek Jones — November 26, 2010 @ 1:54 am

      • Just to reiterate, I mark 100 pieces of course each week for eight weeks. That’s a lot of marking. Last year, students’ were free and easy with return statements making quick debugging incredibly hard.

        I completely agree that I should highlight that is_neg = (x < 0) is the correct thing to do. Thanks.

        Comment by csgillespie — November 26, 2010 @ 9:49 am

  8. I’m OCD and a style Nazzi and would have fits if I had to read your code. Here’s the one true style:

    f <- function(x, i)
    {
       y <- x[ , i] + 3
       return(y)
    }

    Of course I would never write that particular function except as a style example.

    Comment by Rob Steele — November 26, 2010 @ 2:53 pm

    • To be honest the style your describe is how I format functions and I use the style described in the article for for loops and if statements. However, I thought it would be easier for me (and students) to enforce a single bracket style.

      I know, I sold out ;)

      For info, this course isn’t just about programming, I have to teach them some statistics as well.

      Comment by csgillespie — November 26, 2010 @ 2:59 pm

  9. Would be interested to understand the reasoning for “no opening braces on their own line”. I do it all the time and find it much clearer to read, particularly in connection with proper indentation. The code example provided in comment 8 proves the case, I believe.

    Comment by Rainer — December 9, 2012 @ 12:42 am

  10. Yet Another Random Suggestion: Since R specifies function methods with a period, e.g. ‘MyFunc.myclass’ , I find it confusing or misleading to name objects with a period e.g. ‘mydata.fitted’ . Not to mention the dissonance periods in object names causes people familiar with Ruby programming :-) .

    Comment by carlwitthoft — December 18, 2012 @ 2:16 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Shocking Blue Green Theme. Get a free blog at WordPress.com

Follow

Get every new post delivered to your Inbox.