Today I've been working on such code for predicting epidemic outcomes. It was taking 14 seconds to run, which isn't too bad (especially considering some other work I'm doing takes 6 hours using OpenMP on an 8-core machine), but in my inspection I'd spotted some odd R code.
The oddity here is that var is one-dimensional and so is coef[1,]. This matrix multiply is really just something like exp(sum(var*coef[,1])). So I tried that, and the code took 6 times longer to run. Matrix multiplications are efficient in R, even in one dimension.
So I switched back to the original code. Time to get profiling. This is easy in R.
This told me most of the time was spent in the
tfunction, within the function I was playing with earlier. Then I noticed that pretty much wherever
coefwas used, it was transposed with the
I refactored the code to do
t(coef)once, and then pass the transposed matrix around. That sped the code up from the original 14 seconds to 3 seconds. Sweet.
Note that at each stage I check that the new output is the same as the old output - it can be useful to write some little test functions at this point, but doing formal test-driven development can be a bit tricky for statistical applications in R.
There's still a bit of work to do to make this code a bit more general purpose. Currently it works on predictions for one particular country - we want to do predictions over half of Africa!