Why?

August 16, 2011

B. Ripley – The R development process (useR! 2011)

Filed under: Conferences, R, useR! 2011 — Tags: , , — csgillespie @ 8:48 am

There are my notes on the User2011 invited talk. Brian Ripley has been a member of R core since 1998

The R Development Process – A insideR’s view

R Timeline:

  • JCGS paper submitted in 1995;
  • 1997: CRAN(Mar), Core team(Aug), CVS (Sept);
  • R 1.0.0 Feb 2000 – 2.8MB. Many people don’t take 0.X.X seriously;
  • R 2.0.0 Oct 2004, 10MB (actually 1.10.0);
  • R 2.14.0. Oct 2011, ext 22MB;
  • Roughly 4000 repo commits per year.

In the future, 2.15.0 scheduled for Mar 2012. R 3.0.0 has been discussed for a few years, but keeping legacy support could be tricky – there are currently around 3200 packages. So no plans for 3.0.0 in the near future. R-core has 20 members, but several are inactive and only a handful are actively developing R (there are other valuable contributions). There are currently 80 successful submissions per week.

CRAN

CRAN is around 70GB with 1.9 GB for the current source packages. 10000 packages for Christmas 2016. Submission process is handled almost entirely by Kurt Hornik. It is very time-consuming to check packages – there are 110 packages submitted each week. In 2004, CRAN was replaced by “repos”. However, few public repositories have emerged. Binary packages are kept for two versions.

The R Development Process

The R CORE team meets in person only every couple of years. R Core have total control over R. A rough criterion of membership is:

when it was more work to have someone out than in

Normal day to day business is by email as members are over a variety of time zones. The R foundation is the legally constituted body, with R-core (voting) members plus a small number other people.

Getting features into R

R was principally develpoped for the benefit of the core team. Only they have votes.

Most of what you see in R is there because core members wanted it for research, teaching, support for other projects, or to develop R itself. For example, the lm package is their because of a 1998 course in regression. Since almost all core R members are  mathematician, they decided to build very general solutions rather than specific solutions.

If  a core member accepts a contribution they are commiting themselves/R-core to supporting that feature for many years. R-core have regretted accepting some (even small) contributions. So most new features should go into a package not into the core.

Timescales

  • Short: psnice, lis.dirs(recursive=FALSE).
  • Year or two: Internationalization.

Portability

Trying to phase out bash, sh and Make files for ease of use, maintainance, and performance. The parser for Rd2 was written in bison, but all the conversion scripts are in R. Also Fortran is becoming a problem since neither Apple nor Microsoft support it in their SDK. Legacies of R’s 32 bit beginnings is that there is only a single integer type. Longer integers have boon on the horizon for years, but still seems tricky. Could be in 3.0.0

Performance

For a long time, performance issues could be solved by waiting six months for a new computer. However, this isn’t true any more. Rather, we have multiple cores. New package parallel to support multi-core processors in the next version of R.

The future

R is heavily dependent on a small group of altruistic people who can feel that their contributions are not treated with respect. People have lives outside R, and circumstances and health do changes.

Other future developments are low-level support for threading, GUIs, vector types, replace library() with use() and moving to a yearly release schedule.

Please note that the notes/talks section of this post is merely my notes on the presentation. I may have made mistakes: these notes are not guaranteed to be correct. Unless explicitly stated, they represent neither my opinions nor the opinions of my employers. Any errors you can assume to be mine and not the speaker’s. I’m happy to correct any errors you may spot – just let me know! The above paragraph was stolen from Allyson Lister who makes excellent notes when she attend conferences.

Advertisements

2 Comments »

  1. Thanks. Very interesting read.

    Comment by vzemlys — August 16, 2011 @ 8:15 pm

  2. […] statisticians.  As “R was principally developed for the benefit of the core team” (R core member Brian Ripley, useR! 2011) and this core team is comprised of statisticians it is perhaps not surprising that ANOVA is not as […]

    Pingback by ANOVAs with Custom Contrast in R « Stack Exchange Stats Blog — September 13, 2011 @ 6:58 pm


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: