Graphical processing units (GPUs) are all the rage these days. Most journal issues would be incomplete if at least one article didn’t mention the word “GPUs”. Like any good geek, I was initially interested with the idea of using GPUs for statistical computing. However, last summer I messed about with GPUs and the sparkle was removed. After looking at a number of papers, it strikes me that reviewers are forgetting to ask basic questions when reviewing GPU papers.
- For speed comparisons, do the authors compare a GPU with a multi-core CPU. In many papers, the comparison is with a single-core CPU. If a programmer can use CUDA, they can certainly code in pthreads or openMP. Take off a factor of eight when comparing to a multi-core CPU.
- Since a GPU has (usually) been bought specifically for the purpose of the article, the CPU can be a few years older. So, take off a factor of two for each year of difference between a CPU and GPU.
- I like programming with doubles. I don’t really want to think about single precision and all the difficulties that entails. However, many CUDA programs are compiled as single precision. Take off a factor of two for double precision.
- When you use a GPU, you split the job in blocks of threads. The number of threads in each block depends on the type of problem under consideration and can have a massive speed impact on your problem. If your problem is something like matrix multiplication, where each thread multiplies two elements, then after a few test runs, it’s straightforward to come up with an optimal thread/block ratio. However, if each thread is a stochastic simulation, it now becomes very problem dependent. What could work for one model, could well be disastrous for another.
So in many GPU articles the speed comparisons could be reduced by a factor of 32!
Just to clarify, I’m not saying that GPUs have no future, rather, there has been some mis-selling of their potential usefulness in the (statistical) literature.
GPUs seem to be all the rage these days. At the last Bayesian Valencia meeting, Chris Holmes gave a nice talk on how GPUs could be leveraged for statistical computing. Recently Christian Robert arXived a paper with parallel computing firmly in mind. In two weeks time I’m giving an internal seminar on using GPUs for statistical computing. To start the talk, I wanted a few graphs that show CPU and GPU evolution over the last decade or so. This turned out to be trickier than I expected.
After spending an afternoon searching the internet (mainly Wikipedia), I came up with a few nice plots.
Intel CPU clock speed
CPU clock speed for a single cpu has been fairly static in the last couple of years – hovering around 3.4Ghz. Of course, we shouldn’t fall completely into the Megahertz myth, but one avenue of speed increase has been blocked:
Computational power per die
Although single CPUs have been limited, due to the rise of multi-core machines, the computational power per die has still been increasing
GPUs vs CPUs
When we compare GPUs with CPUs over the last decade in terms of Floating point operations (FLOPs), we see that GPUs appear to be far ahead of the CPUs
Sources and data
- You can download the data files and R code used to generate the above graphs.
- If you find them useful, please drop me a line.
- I’ll probably write further posts on GPU computing, but these won’t go through the R-bloggers site (since it has little to do with R).
- Data for Figures 1 & 2 was obtained from “Is Parallel Programming Hard, And, If So, What Can You Do About It?” This book got the data from Wikipedia
- Data from Figure 3 was mainly from Wikipedia and the odd mailing list post.
- I believe these graphs show the correct general trend, but the actual numbers have been obtained from mailing lists, Wikipedia, etc. Use with care.
This post is based on chapter 1.4.3 of Advanced Markov Chain Monte Carlo. Previous posts on this book can be found via the AMCMC tag.
The ratio-of-uniforms was initially developed by Kinderman and Monahan (1977) and can be used for generating random numbers from many standard distributions. Essentially we transform the random variable of interest, then use a rejection method.
The algorithm is as follows:
Repeat until a value is obtained from step 2.
- Generate uniformly over .
- If . return as the desired deviate.
The uniform region is
In AMCMC they give some R code for generate random numbers from the Gamma distribution.
I was going to include some R code with this post, but I found this set of questions and solutions that cover most things. Another useful page is this online book.
Thoughts on the Chapter 1
The first chapter is fairly standard. It briefly describes some results that should be background knowledge. However, I did spot a few a typos in this chapter. In particular when describing the acceptance-rejection method, the authors alternate between and .
Another downside is that the R code for the ratio of uniforms is presented in an optimised version. For example, the authors use
EXP1 = exp(1) as a global constant. I think for illustration purposes a simplified, more illustrative example would have been better.
This book review has been going with glacial speed. Therefore in future, rather than going through section by section, I will just give an overview of the chapter.
Midway through 2008 a new (serious) internet vulnerability was discovered by Dan Kaminsky. Dan realised that there was a serious flaw in the way the Internet’s domain name system works. This flaw was critical, because it allowed the bad guys to redirect users to malicious sites without detection.
What is DNS
The Domain name system (DNS) converts easy to remember web addresses into their numerical IP counter parts. For example, rather than trying to remember 18.104.22.168, we just need to remember www.ncl.ac.uk. So when you type an URL into your address bar, you first ask your DNS resolver if it knows the IP address. If the resolver doesn’t know it will ask a DNS server for directions.
In the good old days, no-one worried about security and consequently the DNS protocol was lax. First, all queries to the DNS server were typically through the same port. Second, each request had a unique ID (from 1- 65,536), however, the ids were generated in a completely predicable manner, i.e. 1, 2, 3, ….65536. This meant that an attacker could respond to the resolver (with a fake IP address) before the DNS server got a chance.
The obvious solution to this is for each request to use a (pseudo) random port and id.
Checking your DNS address
All this is old news, and most DNS resolvers should have been updated – however a few haven’t. To test your whether your DNS resolver is doing something sensible, run the DNS spoofing test provided by grc. Essentially, the test makes a few thousand consecutive calls to your DNS resolver and stores the id and port number of the request.
The test takes a few minutes and is completely painless. At the end you get a few graphs showing your port and id numbers. My ID distribution graph seems to indicate that the resolver that I’m using is issuing random query transaction IDs.