Why?

May 25, 2011

Statistical podcast: random number seeds

Filed under: Computing, Geekery — Tags: , , , , — csgillespie @ 10:39 pm

One of the podcasts I listen to each week is Security Now! Typically, this podcast has little statistical content, as its main focus is computer security, but episode 301 looks at how to generate truly random numbers for seeding pseudo random number generators.

Generating truly random numbers to be used as a seed, turns out to be rather tricky. For example, in the Netscape browser, the random seed used by version 1.0 of the SSL protocol combined the time of day and the process number to seed its random number generator. However, it turns out that the process number is usually a small subset of all possible ids, and so is fairly easy to guess.

Recent advances indicate that we can get “almost true” randomness by taking multiple snap shorts of the processor counter. Since the counter covers around 3 billion numbers each second, we can use the counter to create a true random seed.

To find out more, listen to the podcast. The discussion on random seeds begins mid-way through the podcast.

May 12, 2011

Makefiles and Sweave

Filed under: Computing, latex, R — Tags: , , , , — csgillespie @ 8:19 pm

A Makefile is a simple text file that controls compilation of a target file. The key benefit of using Makefile is that it uses file time stamps to determine if a particular action is needed. In this post we discuss how to use a simple Makefile that compiles a tex file that contains a number of \include statements. The files referred to by the \include statements are Sweave files.

Suppose we have a master tex file called master.tex. In this file we have:

\include chapter1
\include chapter2
\include chapter3
....

where the files chapter1, chapter2, chapter3 are Sweave files. Ideally, when we compile master.tex, we only want to sweave if the time stamp of chapter1.tex is older than the time stamp of chapter1.Rnw. This conditional compiling is even more important when we have a number of sweave files.

Meta-rules

To avoid duplication in a Makefile, it’s handy to use meta-rules. These rules specify how to convert from one file format to another. For example,

.Rnw.tex:
    R CMD Sweave $<

is a meta rule for converting an Rnw file to a tex file. In the above meta-rule, $< is the filename, i.e. chapter1.Rnw. Other helpful meta rules are:

.Rnw.R:
    R CMD Stangle $<

which is used to convert between Rnw and R files. We will also have a meta-rule for converting from .tex to .pdf.

For meta-rules to work, we have to list all the file suffixes that we will convert between. This means we have to include the following line:

.SUFFIXES: .tex .pdf .Rnw .R

Files to convert

Suppose we have a master tex file called master.tex and a sweave file chapter1.Rnw. This means we need to convert from:

  • master.tex to master.pdf
  • chapter1.Rnw to chapter1.tex
  • chapter1.Rnw to chapter1.R

Obviously, we don’t want to write down every file we need – especially if we have more than one sweave file. Instead, we just want to state the master file and the Rnw files. There are a couple of ways of doing this, however, the following way combines flexibility and simplicity. We first define the master and Rnw files:


##Suppose we have three Sweave files with a single master file
MAIN = master
RNWINCLUDES = chapter1 chapter2 chapter3

Now we add in the relevant file extensions

TEX = $(RNWINCLUDES:=.tex)
RFILES = $(RNWINCLUDES:=.R)
RNWFILES = $(INCLUDES:=.Rnw)

In the Makefile, whenever we use the $(TEX) variable, it is automatically expanded to

chapter1.tex chapter2.tex chapter3.tex

A similar rule applies to $(RFILES) and $(RNWFILES).

Conversion rules

We now define the file conversion rules. When we build our pdf file we want to:

  • build the tex file from Rnw file only if the Rnw files have changed or if the tex file doesn’t exist.
  • build the pdf file from the tex file only if master.tex file has changed or one of the Rnw files has changed, or the pdf file doesn’t exist.

We can accomplish this with the following rule:

$(MAIN).pdf: $(TEX) $(MAIN).tex

Typically, I also have a dependencies on a graphics directory and a bibtex file

$(MAIN).pdf: $(TEX) $(MAIN).tex refs.bib graphics/*.pdf

We also have a conversion rule to R files.

R: $(RFILES)

Cleaning up

We also use sweave to clean up after ourselves:

clean:
rm -fv $(MAIN).pdf $(MAIN).tex $(TEX) $(RFILES)
rm -fv *.aux *.dvi *.log *.toc *.bak *~ *.blg *.bbl *.lot *.lof
rm -fv *.nav *.snm *.out *.pyc \#*\# _region_* _tmp.* *.vrb
rm -fv Rplots.pdf *.RData

The complete Makefile

In the Makefile below:

  • make all – creates master.pdf;
  • make clean – deletes all files created as part of the latex and sweave process;
  • make R – creates the R files from the Rnw files.

.SUFFIXES: .tex .pdf .Rnw .R

MAIN = master
RNWINCLUDES = chapter1 chapter2 chapter3
TEX = $(RNWINCLUDES:=.tex)
RFILES = $(RNWINCLUDES:=.R)
RNWFILES = $(INCLUDES:=.Rnw)

all: $(MAIN).pdf
    $(MAIN).pdf: $(TEX) $(MAIN).tex

R: $(RFILES)

view: all
    acroread $(MAIN).pdf &

.Rnw.R:
    R CMD Stangle $<

.Rnw.tex:
    R CMD Sweave $<

.tex.pdf:
    pdflatex $<
    bibtex $*
    pdflatex $<
    pdflatex $<

clean:
    rm -fv $(MAIN).pdf $(MAIN).tex $(TEX) $(RFILES)
    rm -fv *.aux *.dvi *.log *.toc *.bak *~ *.blg *.bbl *.lot *.lof
    rm -fv *.nav *.snm *.out *.pyc \#*\# _region_* _tmp.* *.vrb
    rm -fv Rplots.pdf *.RData

Useful links

  • Jeromy Anglim’s post on Sweave and Make;
  • Ross Ihaka’s Makefile on Sweave;
  • Instead of using a Makefile, you could also use a shell script;

January 25, 2011

CPU and GPU trends over time

Filed under: Computing, R — Tags: , , , , , , — csgillespie @ 4:04 pm

GPUs seem to be all the rage these days. At the last Bayesian Valencia meeting, Chris Holmes gave a nice talk on how GPUs could be leveraged for statistical computing. Recently Christian Robert arXived a paper with parallel computing firmly in mind. In two weeks time I’m giving an internal seminar on using GPUs for statistical computing. To start the talk, I wanted a few graphs that show CPU and GPU evolution over the last decade or so. This turned out to be trickier than I expected.

After spending an afternoon searching the internet (mainly Wikipedia), I came up with a few nice plots.

Intel CPU clock speed

CPU clock speed for a single cpu has been fairly static in the last couple of years  – hovering around 3.4Ghz. Of course, we shouldn’t fall completely into the Megahertz myth, but one avenue of speed increase has been blocked:

Computational power per die

Although single CPUs have been limited, due to the rise of multi-core machines,  the computational power per die has still been increasing

GPUs vs CPUs

When we compare GPUs with CPUs over the last decade in terms of Floating point operations (FLOPs), we see that GPUs appear to be far ahead of the CPUs

 

Sources and data

  • You can download the data files and R code used to generate the above graphs.
    • If you find them useful, please drop me a line.
    • I’ll probably write further posts on GPU computing, but these won’t go through the R-bloggers site (since it has little to do with R).
  • Data for Figures 1 & 2 was obtained from “Is Parallel Programming Hard, And, If So, What Can You Do About It?” This book got the data from Wikipedia
  • Data from Figure 3 was mainly from Wikipedia and the odd mailing list post.
  • I believe these graphs show the correct general trend, but the actual numbers have been obtained from mailing lists, Wikipedia, etc. Use with care.
« Newer Posts

Blog at WordPress.com.