Why?

November 23, 2010

R Style Guide

Filed under: R — Tags: , , — csgillespie @ 3:51 pm

Each year I have the pleasure (actually it’s quite fun) of teaching R programming to first year mathematics and statistics students. The vast majority of these students have no experience of programming, yet think they are good with computers because they use facebook!

Debugging students' R scripts

The class has around 100 students, and there are eight practicals. In some of  these practicals  the students have to submit code. Although the code is “marked” by a script, this only detects if the code is correct. Therefore, I have to go through a lot of R functions by hand and find bugs.

  • First year the course ran, I had no style guide.
    • Result: spaghetti R code.
  • Second year: asked the students to indent their code.
    • In fact, during practicals I refused to debug in any R code that hadn’t been indented.
    • Result: nicer looking code and more correct code.
  • This year I intend to introduce a R style guide based loosely on Google’s and Hadley’s guides.
    • One point that’s in my guide and not (and shouldn’t be) in the above style guides, is that all functions must have one and only return statement. I tend to follow the single return rule for the majority of my R functions, but do, on occasions, break it. The bible of code styling, Code Complete, recommends that you use returns judiciously.

R Style Guide

This style guide is intended to be very light touch. It’s intended to give students the basis of good programming style, not be a guide for submitting to cran.

File names

File names should end in .R and, of course, be meaningful. Files should be stored in a meaningful directory – not your Desktop!

GOOD: predict_ad_revenue.R
BAD: foo.R

Variable & Function Names

Variable names should be lowercase. Use _ to separate words within a name. Strive for concise but meaningful names (this is not easy!)

GOOD: no_of_rolls
BAD: noOfRolls, free

Function names have initial capital letters and are written in CamelCase

GOOD: CalculateAvgClicks
BAD: calculate_avg_clicks , calculateAvgClicks

If possible, make function names verbs.

Curly Braces

An opening curly brace should never go on its own line; a closing curly brace should always go on its own line.

GOOD:
if (x == 5) {
  y = 10
}
RtnX = function(x) {
  return(x)
}
BAD:
RtnX = function(x)
{
  return(x)
}

Functions

Functions must have a single return function just before the final brace

GOOD:
IsNegative = function(x){
  if (x < 0) {
    is_neg = TRUE
  } else {
    is_neg = FALSE
  }
  return(is_neg)
}
BAD:
IsNegative = function(x) {
  if (x < 0){
    return(TRUE)
  } else {
    return(FALSE)
  }
}

Of course, the above function could and should be simplified to
is_neg = (x < 0)

Commenting guidelines

Comment your code. Entire commented lines should begin with # and one space.  Comments should explain the why, not the what.

What’s missing

I decided against putting a section in on “spacing” , i.e. place spaces around all binary operators (=, +, -, etc.). I think spacing may be taking style a bit too far for a first year course.

Comments welcome!

November 16, 2010

Assignment operators in R: ‘=’ vs. ‘<-‘

Filed under: R — Tags: , , — csgillespie @ 7:33 pm

In R, you can use  both ‘=’ and ‘<-‘ as assignment operators. So what’s the difference between them and which one should you use?

What’s the difference?

The main difference between the two assignment operators is scope. It’s easiest to see the difference with an example:

##Delete x (if it exists)
> rm(x)
> mean(x=1:10) #[1] 5.5
> x #Error: object 'x' not found

Here x is declared within the function’s scope of the function, so it doesn’t exist in the user workspace. Now, let’s run the same piece of code with using the <- operator:

> mean(x <- 1:10)# [1] 5.5
> x # [1] 1 2 3 4 5 6 7 8 9 10

This time the x variable is declared within the user workspace.

When does the assignment take place?

In the code above, you may be tempted to thing that we “assign 1:10 to x, then calculate the mean.” This would be true for languages such as C, but it isn’t true in R. Consider the following function:

> a <- 1
> f <- function(a) return(TRUE)
> f <- f(a <- a + 1); a
[1] TRUE
[1] 1

Notice that the value of a hasn’t changed! In R, the value of a will only change if we need to evaluate the argument in the function. This can lead to unpredictable behaviour:

> f <- function(a) if(runif(1)>0.5) TRUE else a
> f(a <- a+1);a
[1] 2
> f(a <- a+1);a
[1] TRUE
[1] 2
> f(a <- a+1);a
[1] 3

Which one should I use

Well there’s quite a strong following for the “<-” operator:

  • The Google R style guide prohibits the use of “=” for assignment.
  • Hadley Wickham’s style guide recommends “<-“
  • If you want your code to be compatible with S-plus you should use “<-“
    • Update Following a comment from David Smith below, it seems that S-plus now accepts “=”.
  • I believe that the General R community recommend using “<-” – see for example this link in the mailing list.

However, I tend always use the “=” operator for the following reasons:

  • The other languages I program in (python, C and occasionally JavaScript) use the “=” operator.
  • It’s quicker to type “=” and “<-“.
  • Typically, when I type declare a variable – I only want it to exist in the current workspace.
  • Since I have the pleasure of teaching undergraduates their first course in programming, using “=” avoids misleading expressions like if (x[1]<-2)

Also Introducing Monte Carlo Methods with R, by Robert and Casella recommends using “=”.

If I’m missing something or you disagree, please leave a comment – I would be very interested.

References

The Shocking Blue Green Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.