Graphical processing units (GPUs) are all the rage these days. Most journal issues would be incomplete if at least one article didn’t mention the word “GPUs”. Like any good geek, I was initially interested with the idea of using GPUs for statistical computing. However, last summer I messed about with GPUs and the sparkle was removed. After looking at a number of papers, it strikes me that reviewers are forgetting to ask basic questions when reviewing GPU papers.
- For speed comparisons, do the authors compare a GPU with a multi-core CPU. In many papers, the comparison is with a single-core CPU. If a programmer can use CUDA, they can certainly code in pthreads or openMP. Take off a factor of eight when comparing to a multi-core CPU.
- Since a GPU has (usually) been bought specifically for the purpose of the article, the CPU can be a few years older. So, take off a factor of two for each year of difference between a CPU and GPU.
- I like programming with doubles. I don’t really want to think about single precision and all the difficulties that entails. However, many CUDA programs are compiled as single precision. Take off a factor of two for double precision.
- When you use a GPU, you split the job in blocks of threads. The number of threads in each block depends on the type of problem under consideration and can have a massive speed impact on your problem. If your problem is something like matrix multiplication, where each thread multiplies two elements, then after a few test runs, it’s straightforward to come up with an optimal thread/block ratio. However, if each thread is a stochastic simulation, it now becomes very problem dependent. What could work for one model, could well be disastrous for another.
So in many GPU articles the speed comparisons could be reduced by a factor of 32!
Just to clarify, I’m not saying that GPUs have no future, rather, there has been some mis-selling of their potential usefulness in the (statistical) literature.
In a recent post, I discussed some statistical consultancy I was involved with. I was quite proud of the nice ggplot2 graphics I had created. The graphs nicely summarised the main points of the paper:
I’ve just had the proofs from the journal, and next to the graphs there is the following note:
It is not usual BJS style to include 95 per cent confidence intervals in K-M
curves. Could you please re-draw Figs 1 & 2 omitting these and INCLUDING ALL
FOUR CURVES IN A SINGLE GRAPH. (If you wish to include 95% c.i., the data could
be produced in tabular form instead.)
They have a policy of not including CI on graphs? So instead of a single nice graphic, they now want a graph and a table with (at least) 9 rows and 5 columns?
Each year I try to carry out some statistical consultancy to give me experience in other areas of statistics and also to provide teaching examples. Last Christmas I was approached by a paediatric consultant from the RVI who wanted to carry out prospective survival analysis. The consultant, Bruce Jaffray, had performed Nissen fundoplication surgery on 230 children. Many of the children had other medical conditions such as cerebral palsy or low BMI. He was interested in the factors that affected patients’ survival.
We fitted a standard cox proportional hazards model. The following covariates were significant:
- cerebral palsy
- need for revision surgery. This was when the child had to return to hospital for more surgery.
- an interaction term between gastrostomy & cerebral palsy.
The interaction term was key to getting a good model fit. The figures (one of which is shown below) were constructed using ggplot2 and R. The referees actually commented on the (good) quality statistical work and nice figures! Always nice to read. Unfortunately, there isn’t a nice survival to ggplot2 interface. I had to write some rather hacky R code
The main finding of the paper was the negative effect of cerebral palsy and gastrostomy on survival. Unfortunately, if a child had a gastronomy or had cerebral palsy then survival was dramatically reduced. The interaction effect was necessary, otherwise we would have predicted that all children with a gastronomy and cerebral palsy wouldn’t survive.
- There was a rather strange and strong gender effect – male survival was greater than female.
- The revision covariate was also significant – children who needed their fundoplication redone had increased survival. At first glance this is strange – the operation had to be redone, yet this was good for survival. However, this was really a red herring. Essentially children who had their surgery redone had by definition survived a minimum amount of time. I think something a bit more sophisticated could have been done with this variable, but the numbers weren’t that large.