Ever since I took a look at using MATLAB, I’ve been disappointed with the performance of its GPU random number generator. In MATLAB 2012a, for example, it’s not much faster than the CPU implementation on my GPU hardware. Consider the following code
function gpuRandTest2012a(n)
mydev=gpuDevice();
disp('CPU - Mersenne Twister');
tic
CPU = rand(n);
toc
sg = parallel.gpu.RandStream('mrg32k3a','Seed',1);
parallel.gpu.RandStream.setGlobalStream(sg);
disp('GPU - mrg32k3a');
tic
Rg = parallel.gpu.GPUArray.rand(n);
wait(mydev);
toc
Running this on MATLAB 2012a on my laptop gives me the following typical times (If you try this out yourself, the first run will always be slower for various reasons I’ll not go into here)
>> gpuRandTest2012a(10000)
CPU - Mersenne Twister
Elapsed time is 1.330505 seconds.
GPU - mrg32k3a
Elapsed time is 1.006842 seconds.
Running the same code on MATLAB 2012b, however, gives a very pleasant surprise with typical run times looking like this
CPU - Mersenne Twister
Elapsed time is 1.590764 seconds.
GPU - mrg32k3a
Elapsed time is 0.185686 seconds.
So, generation of random numbers using the GPU is now over 7 times faster than CPU generation on my laptop hardware–a significant improvment on the previous implementation.
New generators in 2012b
The MATLAB developers went a little further in 2012b though. Not only have they significantly improved performance of the mrg32k3a combined multiple recursive generator, they have also implemented two new GPU random number generators based on the . Here are the timings for the generation of 100 million random numbers in MATLAB 2012b
Get the code –
CPU - Mersenne Twister
Elapsed time is 1.370252 seconds.
GPU - mrg32k3a
Elapsed time is 0.186152 seconds.
GPU - Threefry4x64-20
Elapsed time is 0.145144 seconds.
GPU - Philox4x32-10
Elapsed time is 0.129030 seconds.
Bear in mind that I am running this on the relatively weak GPU of my laptop! If anyone runs it on something stronger, I’d love to hear of your results.
Laptop model: Dell XPS L702X
CPU: @2Ghz software overclockable to 2.9Ghz. 4 physical cores but total 8 virtual cores due to Hyperthreading.
GPU: with 144 CUDA Cores. Graphics clock: 590Mhz. Processor Clock:1180 Mhz. 3072 Mb DDR3 Memeory
RAM: 8 Gb
OS: Windows 7 Home Premium 64 bit.
MATLAB: 2012a/2012b
- Full Post
No comments:
Post a Comment