# Why?

## May 29, 2011

### Impact factors for statistics journals

Filed under: Publications — Tags: , , , — csgillespie @ 1:08 pm

The other day I came across “Nefarious numbers” by Douglas Arnold and Kristine Fowler in arXiv. This paper examines how impact factors can be easily and blatantly manipulated.

## What is an Impact Factor

For a particular year, the impact factor of a journal is the average number of citations received per paper, published in that journal during the two preceding years. The impact factor as a number of glaring flaws:

• Impact factors vary across disciplines.
• The submission to publication process in a statistical journal can take up to a year.
• The impact factor is just a single statistic out of many possible measures.
• The underlying database contains many errors.

## International Journal of Nonlinear Sciences and Numerical Simulation

The Australian Research Council (ARC) recently released an evaluation, listing quality ratings for over 20,000 peer-reviewed journals across various disciplines. This list was constructed through a review process involving academics, disciplinary bodies and learned academies. The outcome is that over 20,000 peer-journals are ranked A* to C, where

• A*: one of the best in its field or sub-field;
• A: very high quality;
• B: solid, though not outstanding reputation;
• C: does not meet the criteria of the higher tiers.

The ARC ranked the international journal of nonlinear sciences and numerical simulation (IJNSNS) as a B. However, in 2008 this journal had an impact factor of 8.9 – more than double the next highest journal in the Applied Mathematics section. As the paper explains, the reason for the large impact factor is easy to see. In 2008, the top-citing three authors to IJNSNS were:

• Ji-Huan He, the journal’s Editor-in-Chief, who cited, within a the two-year window, 243 times;
• D. D. Ganji, a member of the editorial board, with 114 cites;
• Mohamed El Naschie, a regional editor, with 58 cites.

Comparing these numbers with other journals, shows how extreme IJNSNS really is – the next highest impact factor is around 4. Arnold and Fowler also investigate journals where the citations occurs. These journals turn out to be IJNSNS itself or special issues of other journals edited by someone on the IJNSNS board.

## Impact Factors for Statistics Journals

The ARC statistics section contains around two hundred journals. Some of these journals are “traditional” statistics journals, such as JASA, RSS, and biometrics. Other journals are more applied, such as Bioinformatics and Mathematical Biosciences. So in the following comparison, I just considered journals classed as “statistics” by the ISI Web of Knowledge. This leaves seventy-seven journals.

The following plot shows the two- and five-year impact factor for the seventy-seven statistical journals, grouped by the ARC rating. The red dots show the median impact factor for a particular grouping.

As would be expected, for the two-year IF there is very little difference between the ARC ratings – although more than I expected. Once we calculate the five-year impact factors,  the difference between ratings are clearer. Since many of the group C journals are new, a number of them don’t have five-year impact factor.

## Outlying Statistical Journals

There are three journals that stand out from their particular groups:

• Statistical Science, a group A journal. Since this is mainly a review journal, so it’s really not surprising that this has a high impact factor.
• Journal of Statistical and the Stata journal, group C journals. Since these are “statistical computing” journals, it isn’t that surprising that they have high impact

## Should we use Impact Factors

The best answer would be no! Just read the first page of  “Nefarious numbers” for a variety of reasons why we should dump impact factors. However, I suspect that impact factors will be forced on many of us, as a tool to quantify our research. Therefore, while we should try to fight against them, we should also keep an eye on them for evidence of people playing the system.

## May 25, 2011

### Statistical podcast: random number seeds

Filed under: Computing, Geekery — Tags: , , , , — csgillespie @ 10:39 pm

One of the podcasts I listen to each week is Security Now! Typically, this podcast has little statistical content, as its main focus is computer security, but episode 301 looks at how to generate truly random numbers for seeding pseudo random number generators.

Generating truly random numbers to be used as a seed, turns out to be rather tricky. For example, in the Netscape browser, the random seed used by version 1.0 of the SSL protocol combined the time of day and the process number to seed its random number generator. However, it turns out that the process number is usually a small subset of all possible ids, and so is fairly easy to guess.

Recent advances indicate that we can get “almost true” randomness by taking multiple snap shorts of the processor counter. Since the counter covers around 3 billion numbers each second, we can use the counter to create a true random seed.

To find out more, listen to the podcast. The discussion on random seeds begins mid-way through the podcast.

## May 12, 2011

### Makefiles and Sweave

Filed under: Computing, latex, R — Tags: , , , , — csgillespie @ 8:19 pm

A Makefile is a simple text file that controls compilation of a target file. The key benefit of using Makefile is that it uses file time stamps to determine if a particular action is needed. In this post we discuss how to use a simple Makefile that compiles a tex file that contains a number of \include statements. The files referred to by the \include statements are Sweave files.

Suppose we have a master tex file called master.tex. In this file we have:

\include chapter1
\include chapter2
\include chapter3
....

where the files chapter1, chapter2, chapter3 are Sweave files. Ideally, when we compile master.tex, we only want to sweave if the time stamp of chapter1.tex is older than the time stamp of chapter1.Rnw. This conditional compiling is even more important when we have a number of sweave files.

## Meta-rules

To avoid duplication in a Makefile, it’s handy to use meta-rules. These rules specify how to convert from one file format to another. For example,

.Rnw.tex:
R CMD Sweave \$<

is a meta rule for converting an Rnw file to a tex file. In the above meta-rule, \$< is the filename, i.e. chapter1.Rnw. Other helpful meta rules are:

.Rnw.R:
R CMD Stangle \$<

which is used to convert between Rnw and R files. We will also have a meta-rule for converting from .tex to .pdf.

For meta-rules to work, we have to list all the file suffixes that we will convert between. This means we have to include the following line:

.SUFFIXES: .tex .pdf .Rnw .R

## Files to convert

Suppose we have a master tex file called master.tex and a sweave file chapter1.Rnw. This means we need to convert from:

• master.tex to master.pdf
• chapter1.Rnw to chapter1.tex
• chapter1.Rnw to chapter1.R

Obviously, we don’t want to write down every file we need – especially if we have more than one sweave file. Instead, we just want to state the master file and the Rnw files. There are a couple of ways of doing this, however, the following way combines flexibility and simplicity. We first define the master and Rnw files:

##Suppose we have three Sweave files with a single master file
MAIN = master
RNWINCLUDES = chapter1 chapter2 chapter3

Now we add in the relevant file extensions

TEX = \$(RNWINCLUDES:=.tex)
RFILES = \$(RNWINCLUDES:=.R)
RNWFILES = \$(INCLUDES:=.Rnw)

In the Makefile, whenever we use the \$(TEX) variable, it is automatically expanded to

chapter1.tex chapter2.tex chapter3.tex

A similar rule applies to \$(RFILES) and \$(RNWFILES).

## Conversion rules

We now define the file conversion rules. When we build our pdf file we want to:

• build the tex file from Rnw file only if the Rnw files have changed or if the tex file doesn’t exist.
• build the pdf file from the tex file only if master.tex file has changed or one of the Rnw files has changed, or the pdf file doesn’t exist.

We can accomplish this with the following rule:

\$(MAIN).pdf: \$(TEX) \$(MAIN).tex

Typically, I also have a dependencies on a graphics directory and a bibtex file

\$(MAIN).pdf: \$(TEX) \$(MAIN).tex refs.bib graphics/*.pdf

We also have a conversion rule to R files.

R: \$(RFILES)

## Cleaning up

We also use sweave to clean up after ourselves:

clean:
rm -fv \$(MAIN).pdf \$(MAIN).tex \$(TEX) \$(RFILES)
rm -fv *.aux *.dvi *.log *.toc *.bak *~ *.blg *.bbl *.lot *.lof
rm -fv *.nav *.snm *.out *.pyc \#*\# _region_* _tmp.* *.vrb
rm -fv Rplots.pdf *.RData

## The complete Makefile

In the Makefile below:

• make all – creates master.pdf;
• make clean – deletes all files created as part of the latex and sweave process;
• make R – creates the R files from the Rnw files.

.SUFFIXES: .tex .pdf .Rnw .R

MAIN = master
RNWINCLUDES = chapter1 chapter2 chapter3
TEX = \$(RNWINCLUDES:=.tex)
RFILES = \$(RNWINCLUDES:=.R)
RNWFILES = \$(INCLUDES:=.Rnw)

all: \$(MAIN).pdf
\$(MAIN).pdf: \$(TEX) \$(MAIN).tex

R: \$(RFILES)

view: all
acroread \$(MAIN).pdf &

.Rnw.R:
R CMD Stangle \$<

.Rnw.tex:
R CMD Sweave \$<

.tex.pdf:
pdflatex \$<
bibtex \$*
pdflatex \$<
pdflatex \$<

clean:
rm -fv \$(MAIN).pdf \$(MAIN).tex \$(TEX) \$(RFILES)
rm -fv *.aux *.dvi *.log *.toc *.bak *~ *.blg *.bbl *.lot *.lof
rm -fv *.nav *.snm *.out *.pyc \#*\# _region_* _tmp.* *.vrb
rm -fv Rplots.pdf *.RData

## Useful links

• Jeromy Anglim’s post on Sweave and Make;
• Ross Ihaka’s Makefile on Sweave;
• Instead of using a Makefile, you could also use a shell script;