Why?

November 16, 2010

Assignment operators in R: ‘=’ vs. ‘<-‘

Filed under: R — Tags: , , — csgillespie @ 7:33 pm

In R, you can use  both ‘=’ and ‘<-‘ as assignment operators. So what’s the difference between them and which one should you use?

What’s the difference?

The main difference between the two assignment operators is scope. It’s easiest to see the difference with an example:

##Delete x (if it exists)
> rm(x)
> mean(x=1:10) #[1] 5.5
> x #Error: object 'x' not found

Here x is declared within the function’s scope of the function, so it doesn’t exist in the user workspace. Now, let’s run the same piece of code with using the <- operator:

> mean(x <- 1:10)# [1] 5.5
> x # [1] 1 2 3 4 5 6 7 8 9 10

This time the x variable is declared within the user workspace.

When does the assignment take place?

In the code above, you may be tempted to thing that we “assign 1:10 to x, then calculate the mean.” This would be true for languages such as C, but it isn’t true in R. Consider the following function:

> a <- 1
> f <- function(a) return(TRUE)
> f <- f(a <- a + 1); a
[1] TRUE
[1] 1

Notice that the value of a hasn’t changed! In R, the value of a will only change if we need to evaluate the argument in the function. This can lead to unpredictable behaviour:

> f <- function(a) if(runif(1)>0.5) TRUE else a
> f(a <- a+1);a
[1] 2
> f(a <- a+1);a
[1] TRUE
[1] 2
> f(a <- a+1);a
[1] 3

Which one should I use

Well there’s quite a strong following for the “<-” operator:

  • The Google R style guide prohibits the use of “=” for assignment.
  • Hadley Wickham’s style guide recommends “<-“
  • If you want your code to be compatible with S-plus you should use “<-“
    • Update Following a comment from David Smith below, it seems that S-plus now accepts “=”.
  • I believe that the General R community recommend using “<-” – see for example this link in the mailing list.

However, I tend always use the “=” operator for the following reasons:

  • The other languages I program in (python, C and occasionally JavaScript) use the “=” operator.
  • It’s quicker to type “=” and “<-“.
  • Typically, when I type declare a variable – I only want it to exist in the current workspace.
  • Since I have the pleasure of teaching undergraduates their first course in programming, using “=” avoids misleading expressions like if (x[1]<-2)

Also Introducing Monte Carlo Methods with R, by Robert and Casella recommends using “=”.

If I’m missing something or you disagree, please leave a comment – I would be very interested.

References
About these ads

26 Comments »

  1. Nice post! To be pedantic though, “=” and “<-" have the same scope; it's just that the "=" operator is overloaded for both parameter association (in function calls) and assignment. I believe "=" is a valid assignment operator in S+ now, too.

    You might be interested in the history of *why* R uses <- for assignment, as referenced here:
    http://blog.revolutionanalytics.com/2008/12/use-equals-or-arrow-for-assignment.html

    Comment by David Smith — November 16, 2010 @ 10:09 pm

    • Nothing wrong with being pedantic – that’s why we’re interested in computing! Do you have a reference for the operator overloading bit?

      Thanks for link. I still have some old R scripts laying around that uses the “underscore” for assignment. Any idea how that fits into the grand scheme of things?

      Comment by csgillespie — November 16, 2010 @ 10:17 pm

      • You’d need to convert those _ characters to <- or =. Underscore was deprecated as an assignment operator in R a long time ago. Now it's a valid character in object_names.

        Comment by David Smith — November 16, 2010 @ 10:34 pm

      • Sorry, I wasn’t being clear (I realise that I just have to change “_” to “=”). What I meant was why did they use “_” in the first place and what brought about the change?

        I suppose that it was because S+ used “_”.

        Comment by csgillespie — November 17, 2010 @ 10:05 am

  2. It’s worth mentioning that you can also use -> for example: 1 + 2 -> a
    sometimes useful especially when using arrow up to find previous line and then assign value/object, example: lm(y~x)-> a

    Comment by Peter Cahusac — November 16, 2010 @ 10:36 pm

    • Good point. Although I’ve started using emacs+ESS more and more. Thereby reducing the need to (directly) use the R terminal.

      Comment by csgillespie — November 17, 2010 @ 10:07 am

  3. I’ve grown used to the <- operator and it's much easier to read the code using that. I sometimes get code from other people who use =, and it helps readability a lot if I change = to <-.

    And regarding the ease of typing = over <-: depends on your keyboard layout, I guess. I use Slovenian layout and I find it easier to write <- than =.

    Comment by Roman Luštrik — November 17, 2010 @ 12:07 am

    • I think “grown used to it” seems to be the main argument for using “<-". I didn't really use R from 2002-2008, so when I started using it again, "_" no longer worked. So I changed to using "="

      Comment by csgillespie — November 17, 2010 @ 10:09 am

      • I simply find it more pleasing to the eye than =.

        Comment by romunov — November 17, 2010 @ 10:12 am

  4. I use ‘<-' to assign variables, and '=' to assign arguments within functions:

    dat<-merge(x=dat1,y=dat2,all.x=T)

    I find it makes the code easier to read. Once you get used to typing '<-' you don't even notice the difference.

    Comment by Frank — November 17, 2010 @ 5:49 am

    • I think that’s what the majority of users do.

      Comment by csgillespie — November 17, 2010 @ 10:10 am

      • It’s actually the place when you see the biggest difference between `<-` and `=`. You could do

        test <- function(a,b) a+b
        test(x<-1,y<-2)

        which creates x and y variables in workspace, whilst test(x=1,y=2) gives you an error.

        Comment by Marek — November 17, 2010 @ 3:33 pm

  5. Thanks for the post! Mostly, I prefer using = to <- for purely esthetic reasons! (I also had a traumatic experience with _ in the old S-plus days where I renamed variables and removed all _…) The other reason is that most languages use = as an assignment and this sounds good enough… Actually, I remember getting a comment at some point that we should have used <- in Introducing Monte Carlo Methods with R but I cannot trace it back.

    Comment by xi'an — November 17, 2010 @ 6:38 am

    • I agree. I think “=” just looks nicer. However, “Beauty is in the eye of the beholder”!

      Comment by csgillespie — November 17, 2010 @ 10:16 am

  6. It gives me hope that even regular users of R find the semantics of the language a little confusing. I’ve been trying to get to grips with it for ages (at the same time as trying to lift my stats knowledge above it’s current risible level). But I find it very difficult to understand exactly what the language does.

    Still, it’s not so bad. Not compared to “underscore as assignment operator”.

    Comment by Phil Lord — November 17, 2010 @ 11:38 am

    • I’m trying to imagine the arguments that must have occured in the R-mailing list when they suggested adding the “=” operator. Perhaps a bit of googling can find them ;)

      Comment by csgillespie — November 17, 2010 @ 11:41 am

  7. I didn’t like ‘<-' at first. Now I strongly support it as a convention.

    I think the main advantage is conceptual clarity. There is no ambiguity that the operation is assigning a property to an object. You can read it as "object x gets the value y." It has directionality. When you type "x = y" it isn't clear which is taking the value of the other. As example, "3 = x" could be interpreted as the number 3 being assigned the letter x. We only know that is not the case because of the less obvious convention that the object on the left is changed by assignment.

    Comment by Harold Baize — November 17, 2010 @ 5:48 pm

    • In some sense I completely agree with you. However,

      • When you do a function call, you would often write: f(x=y). Is that not confusing?
      • No one would/should write “3=x” (or 3 <- x)!
      • I suspect that when you say “x=y” isn’t clear, you have never actually been confused ;)
      • I tend to programme in a few languages. I already hate having to remember the syntax for: if statements, functions, objects for different languages. Case in point: I’m forever typing i++ in R code!

      As I mentioned, I do agree with you. It’s just that R is a special case and I don’t want to change for a special case.

      Comment by csgillespie — November 17, 2010 @ 6:01 pm

  8. As David Smith noted,

    mean(x = 1:10)

    is not assignment but parameter association. However,

    mean(a <- 1:10)

    works as expected, but

    mean(a = 1:10)

    does not. The latter because there is no parameter "a" and "=" is not interpreted as the assignment operator. To make it behave like that, you will need to brace the expression, like

    mean({a = 1:10})

    cf. the help page. It's the only real difference I know about, and as noted on the first blog-link David posted, you probably don't want to assign variables in a function call anyway.

    My reason to support and use "<-" is purely conceptual. Assignment is an oriented operation, which is not conveyed by the symmetric looking "=". Even though R may be a special case, it is good when the syntax supports the line of thought, and we should take advantage of this in R instead of trying to turn it into C-code.

    Comment by Niels Hansen — November 17, 2010 @ 6:59 pm

  9. True that one should never write “3 = x” or “3 <- x" and both return an error message.

    Although I can't recall ever confusing the meaning of "x = y" the syntax "x <- y" is more intuitive, and I like things to make sense. For me it is worth the extra keystroke to be explicit and clear.

    Comment by Harold Baize — November 17, 2010 @ 7:02 pm

  10. See also the discussion of this on Stack Overflow from last year, which covers much of the same ground.

    Comment by richierocks — November 24, 2010 @ 5:15 pm

    • I had already included the link above, but thanks anyway.

      Comment by csgillespie — November 25, 2010 @ 2:21 pm

  11. I wouldn’t mind if it wasn’t so much more costly to type <- ([hold shift] , [release shift] -) than =.

    Comment by Oisín — October 14, 2011 @ 6:29 pm

  12. It’s worth noting that you MUST use the = operator within reference classes when declaring each component. For example, MyClass <- setRefClass( "MyClass", methods = list( myFunction = function( … etc.

    Comment by Sherlock — October 31, 2012 @ 6:02 pm

    • In your case, that would be (regular) argument names, which are always followed by a “=”.

      Comment by Roman Luštrik — November 1, 2012 @ 9:04 am

    • That’s because setRefClass is a function, and so you need to use function-calling syntax within it. It’s the same principle as, for example, if you were calculating a variance you must use var(x, na.rm = TRUE) rather than var(x, na.rm <- TRUE).

      Comment by richierocks — November 1, 2012 @ 9:05 am


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Shocking Blue Green Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.