Graphical processing units (GPUs) are all the rage these days. Most journal issues would be incomplete if at least one article didn’t mention the word “GPUs”. Like any good geek, I was initially interested with the idea of using GPUs for statistical computing. However, last summer I messed about with GPUs and the sparkle was removed. After looking at a number of papers, it strikes me that reviewers are forgetting to ask basic questions when reviewing GPU papers.
For speed comparisons, do the authors compare a GPU with a multi-core CPU. In many papers, the comparison is with a single-core CPU. If a programmer can use CUDA, they can certainly code in pthreads or openMP. Take off a factor of eight when comparing to a multi-core CPU.- Since a GPU has (usually) been bought specifically for the purpose of the article, the CPU can be a few years older. So, take off a factor of two for each year of difference between a CPU and GPU.
- I like programming with doubles. I don’t really want to think about single precision and all the difficulties that entails. However, many CUDA programs are compiled as single precision. Take off a factor of two for double precision.
- When you use a GPU, you split the job in blocks of threads. The number of threads in each block depends on the type of problem under consideration and can have a massive speed impact on your problem. If your problem is something like matrix multiplication, where each thread multiplies two elements, then after a few test runs, it’s straightforward to come up with an optimal thread/block ratio. However, if each thread is a stochastic simulation, it now becomes very problem dependent. What could work for one model, could well be disastrous for another.

