November 16, 2010

Assignment operators in R: ‘=’ vs. ‘<-‘

Filed under: R — Tags: , , — csgillespie @ 7:33 pm

In R, you can use  both ‘=’ and ‘<-‘ as assignment operators. So what’s the difference between them and which one should you use?

What’s the difference?

The main difference between the two assignment operators is scope. It’s easiest to see the difference with an example:

##Delete x (if it exists)
> rm(x)
> mean(x=1:10) #[1] 5.5
> x #Error: object 'x' not found

Here x is declared within the function’s scope of the function, so it doesn’t exist in the user workspace. Now, let’s run the same piece of code with using the <- operator:

> mean(x <- 1:10)# [1] 5.5
> x # [1] 1 2 3 4 5 6 7 8 9 10

This time the x variable is declared within the user workspace.

When does the assignment take place?

In the code above, you may be tempted to thing that we “assign 1:10 to x, then calculate the mean.” This would be true for languages such as C, but it isn’t true in R. Consider the following function:

> a <- 1
> f <- function(a) return(TRUE)
> f <- f(a <- a + 1); a
[1] TRUE
[1] 1

Notice that the value of a hasn’t changed! In R, the value of a will only change if we need to evaluate the argument in the function. This can lead to unpredictable behaviour:

> f <- function(a) if(runif(1)>0.5) TRUE else a
> f(a <- a+1);a
[1] 2
> f(a <- a+1);a
[1] TRUE
[1] 2
> f(a <- a+1);a
[1] 3

Which one should I use

Well there’s quite a strong following for the “<-” operator:

  • The Google R style guide prohibits the use of “=” for assignment.
  • Hadley Wickham’s style guide recommends “<-“
  • If you want your code to be compatible with S-plus you should use “<-“
    • Update Following a comment from David Smith below, it seems that S-plus now accepts “=”.
  • I believe that the General R community recommend using “<-” – see for example this link in the mailing list.

However, I tend always use the “=” operator for the following reasons:

  • The other languages I program in (python, C and occasionally JavaScript) use the “=” operator.
  • It’s quicker to type “=” and “<-“.
  • Typically, when I type declare a variable – I only want it to exist in the current workspace.
  • Since I have the pleasure of teaching undergraduates their first course in programming, using “=” avoids misleading expressions like if (x[1]<-2)

Also Introducing Monte Carlo Methods with R, by Robert and Casella recommends using “=”.

If I’m missing something or you disagree, please leave a comment – I would be very interested.


Blog at WordPress.com.