Friday, September 17, 2010

CUDA Programming with Ruby

Need GPU computing power in your Ruby program? Great! SpeedGo Computing is developing Ruby bindings for CUDA, called sgc-ruby-cuda. Take advantage of your Nvidia CUDA-enabled graphics cards with Ruby now.

Currently, only part of the CUDA Driver API is included. More components such as the CUDA Runtime API will be included to make it as complete as possible.

CUDA Programming with Ruby


require 'rubycu'

include SGC::CU

SIZE = 10
c = CUContext.new

d = CUDevice.get(0) # Get the first device.
c.create(0, d) # Use this device in this CUDA context.

m = CUModule.new
m.load("vadd.ptx") # 'nvcc -ptx vadd.cu'
# vadd.cu is a CUDA kernel program.

da = CUDevicePtr.new # Pointer to device memory.
db = CUDevicePtr.new
dc = CUDevicePtr.new

da.mem_alloc(4*SIZE) # Each Int32 is 4 bytes.
db.mem_alloc(4*SIZE) # Allocate device memory.
dc.mem_alloc(4*SIZE)

ha = Int32Buffer.new(SIZE) # Allocate host memory.
hb = Int32Buffer.new(SIZE)
hc = Int32Buffer.new(SIZE)
hd = Int32Buffer.new(SIZE)

(0...SIZE).each { |i| ha[i] = i }
(0...SIZE).each { |i| hb[i] = 2 }
(0...SIZE).each { |i| hc[i] = ha[i] + hb[i] }
(0...SIZE).each { |i| hd[i] = 0 }

memcpy_htod(da, ha, 4*SIZE) # Transfer inputs to device.
memcpy_htod(db, hb, 4*SIZE)

f = m.get_function("vadd");
f.set_param(da, db, dc, SIZE)
f.set_block_shape(SIZE)
f.launch_grid(1) # Execute kernel program in the device.

memcpy_dtoh(hd, dc, 4*SIZE) # Transfer outputs to host.

puts "A\tB\tCPU\tGPU"
(0...SIZE).each { |i|
puts
"#{ ha[i]}\t#{hb[i]}\t#{hc[i]}\t#{hd[i] }"
}


da.mem_free # Free device memory.
db.mem_free
dc.mem_free

c.detach # Release context.

/* vadd.cu */
extern "C" {
__global__ void vadd(const int* a,
const int* b,
int* c,
int n)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i < n)
c[i] = a[i] + b[i];
}
}
Although the kernel program still need to be written in CUDA C, this Ruby bindings have provided first bridging step towards Ruby GPU computing.

How to execute?


$ ruby extconf.rb
checking for main() in -lcuda... yes
creating Makefile
$ make
...
g++ -shared -o rubycu.so rubycu.o ...
$ nvcc -ptx vadd.cu
$ ruby -I . test.rb
A B CPU GPU
0 2 2 2
1 2 3 3
2 2 4 4
3 2 5 5
4 2 6 6
5 2 7 7
6 2 8 8
7 2 9 9
8 2 10 10
9 2 11 11
Cool! The summation of two vectors is performed in the GPU.

See also:

6 comments:

  1. Very, very cool! There are definitely ruby folks like myself who have a keen interest in this. I'm looking forward to following the progress.

    ReplyDelete
  2. Thank you very much for showing interest. More helps from the community will certainly make the development much more successful. Although I'm primarily developing on a Linux platform, I do wish others to test on various platforms. Subsequently, creating gems, documenting the Ruby CUDA API, etc.

    ReplyDelete
  3. hah ! time to learn ruby ! :))
    thx for the info !

    ReplyDelete
  4. This is a hold post... i wonder if there is any progress...

    i hope something like this will speed up this little example

    [1,2,3,4,5]repeated_permutation(100).map(&:join)

    this will take my cpu, like 15 minutes to process... maybe cuda libs can speed this up to seconds?

    ReplyDelete
  5. Thanks intended for expressing an exceptionally helpful and intensely educational blog. IOS Applications Development

    ReplyDelete
  6. Okay, maybe you will need big storage of drivers some day. You may expect on NVidia drivers download http://bitdrivers.com/manufacturers/nvidia. On my new laptop, I downloaded all software from there, and no problems with work on my PC, so keep in mind, if you will need to.

    ReplyDelete