# Why?

## February 16, 2016

### RANDU: The case of the bad RNG

Filed under: Computing, R — Tags: , , — csgillespie @ 12:15 pm

The German Federal Office for Information Security (BSI) has established
criteria for quality random number generator (rng):

• A sequence of random numbers has a high probability of containing no identical consecutive elements.
• A sequence of numbers which is indistinguishable from true random’ numbers (tested using statistical tests.
• It should be impossible to calculate, or guess, from any given sub-sequence, any previous or future values in the sequence.
• It should be impossible, for all practical purposes, for an attacker to calculate, or guess the values used in the random number algorithm.

Points 3 and 4 are crucial for many applications. Everytime you make a
phone call, contact to a wireless point, pay using your credit card random
numbers are used.

Designing a good random number generator is hard and as a general rule you should never try to. R comes with many good quality random generators. The default generator is the Mersenne-Twister. This rng has a huge period of $2^{19937}-1$ (how many random numbers are generated before we have a repeat).

## Linear congruential generators

A linear congruential generator (lcg) is a relatively simple rng (popular in the 60’s and 70’s). It has a simple form of

$r_{i}=(ar_{i-1}+b) \textrm{ mod }m, \quad i=1, 2, \ldots, m$

where $latexr_0$ is the initial number, known as the seed, and $$a,b,m$$ are the multiplier, additive constant and modulo respectively. The parameters are all integers.

The modulo operation means that at most $m$ different numbers can be generated
before the sequence must repeat – namely the integers $0,1,2, \ldots, m-1$. The
actual number of generated numbers is $h \leq m$, called the period of
the generator.

The key to random number generators is in setting the parameters.

### RANDU

RANDU was a lcg with parameters $m=2^{31}, a=65539$ and $b=0$. Unfortunately this is a spectacularly bad choice of
parameters. On noting that $a=65,539=2^{16}+3$, then

$r_{i+1} = a r_i = 65539 \times r_i = (2^{16}+3)r_i \;.$

So

$r_{i+2} = a\;r_{i+1} = (2^{16}+3) \times r_{i+1} = (2^{16}+3)^2 r_i \;.$

On expanding the square, we get

$r_{i+2} = (2^{32}+6\times 2^{16} + 9)r_i = [6 (2^{16}+3)-9]r_i = 6 r_{i+1} - 9 r_i \;.$

Note: all these calculations should be to the mod $2^{31}$. So there is a large
correlation between the three points!

If compare randu to a standard rng (code in a gist)

It’s obvious that randu doesn’t produce good random numbers. Plotting  $x_i$, $x_{i-1}$ and $x_{i-2}$ in 3d

### Generating the graphics

The code is all in a gist and can be run via

 devtools::source_gist("https://gist.github.com/csgillespie/0ba4bbd8da0d1264b124") 

You can then get the 3d plot via

 scatterplot3d::scatterplot3d(randu[,1], randu[,2], randu[,3], angle=154) ## Interactive version threejs::scatterplot3js(randu[,1], randu[,2], randu[,3]) 

## February 15, 2016

### Shiny benchmarks

Filed under: Computing, R — Tags: , — csgillespie @ 5:49 pm

A couple of months ago, the first version of benchmarkme was released. Around 140 machines have now been benchmarked.

From the fastest (an Apple i7) which ran the tests in around 10 seconds, to the slowest (an Atom(TM) CPU N450 @ 1.66GHz) which took 420 seconds! Other interesting statistics:

• Around 6% of people ran BLAS optimised versions of R;
• No-one (except for machines that I used) ran a byte compiled version of the package.

I intend to write to a blog post or two on BLAS and byte compiling, but for the meantime you can investigate the results via the new shiny interface. The package is still only available on github and can be installed via:

 ## Update the package install.packages(c("drat", "httr", "Matrix", "shiny")) drat::addRepo("csgillespie") install.packages("benchmarkme", type="source") 

You then load the package in the usual way

 library("benchmarkme") ## View past results plot_past() ## shine() # Needs shiny ## get_datatable_past() # Needs DT 

 ## This will take somewhere between 0.5 and 5 minutes ## Increase runs if you have a higher spec machine res = benchmark_std(runs=3) 
 plot(res) ## shine(res) ## get_datatable(res) 
 ## You can control exactly what is uploaded. See details below. upload_results(res)