Sunday, September 26, 2010
Friday, September 17, 2010
Unigine crew: CUDA vs OpenCL vs SPU Part IV
Which language or library you choose to use for your software development has great and prolong impact to the software. Just come across a simple yet interesting benchmark. Perhaps, more details on why such numbers are obtained would be even more enlightening.
Unigine crew: CUDA vs OpenCL vs SPU Part IV
Unigine crew: CUDA vs OpenCL vs SPU Part IV
Labels:
CUDA,
gpu,
OpenCL,
programming
CUDA Programming with Ruby
Need GPU computing power in your Ruby program? Great! SpeedGo Computing is developing Ruby bindings for CUDA, called sgc-ruby-cuda. Take advantage of your Nvidia CUDA-enabled graphics cards with Ruby now.
Currently, only part of the CUDA Driver API is included. More components such as the CUDA Runtime API will be included to make it as complete as possible.
See also:
Currently, only part of the CUDA Driver API is included. More components such as the CUDA Runtime API will be included to make it as complete as possible.
CUDA Programming with Ruby
require 'rubycu'
include SGC::CU
SIZE = 10
c = CUContext.new
d = CUDevice.get(0) # Get the first device.
c.create(0, d) # Use this device in this CUDA context.
m = CUModule.new
m.load("vadd.ptx") # 'nvcc -ptx vadd.cu'
# vadd.cu is a CUDA kernel program.
da = CUDevicePtr.new # Pointer to device memory.
db = CUDevicePtr.new
dc = CUDevicePtr.new
da.mem_alloc(4*SIZE) # Each Int32 is 4 bytes.
db.mem_alloc(4*SIZE) # Allocate device memory.
dc.mem_alloc(4*SIZE)
ha = Int32Buffer.new(SIZE) # Allocate host memory.
hb = Int32Buffer.new(SIZE)
hc = Int32Buffer.new(SIZE)
hd = Int32Buffer.new(SIZE)
(0...SIZE).each { |i| ha[i] = i }
(0...SIZE).each { |i| hb[i] = 2 }
(0...SIZE).each { |i| hc[i] = ha[i] + hb[i] }
(0...SIZE).each { |i| hd[i] = 0 }
memcpy_htod(da, ha, 4*SIZE) # Transfer inputs to device.
memcpy_htod(db, hb, 4*SIZE)
f = m.get_function("vadd");
f.set_param(da, db, dc, SIZE)
f.set_block_shape(SIZE)
f.launch_grid(1) # Execute kernel program in the device.
memcpy_dtoh(hd, dc, 4*SIZE) # Transfer outputs to host.
puts "A\tB\tCPU\tGPU"
(0...SIZE).each { |i|
puts "#{ ha[i]}\t#{hb[i]}\t#{hc[i]}\t#{hd[i] }"
}
da.mem_free # Free device memory.
db.mem_free
dc.mem_free
c.detach # Release context.
Although the kernel program still need to be written in CUDA C, this Ruby bindings have provided first bridging step towards Ruby GPU computing.
/* vadd.cu */
extern "C" {
__global__ void vadd(const int* a,
const int* b,
int* c,
int n)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
if (i < n)
c[i] = a[i] + b[i];
}
}
How to execute?
Cool! The summation of two vectors is performed in the GPU.
$ ruby extconf.rb
checking for main() in -lcuda... yes
creating Makefile
$ make
...
g++ -shared -o rubycu.so rubycu.o ...
$ nvcc -ptx vadd.cu
$ ruby -I . test.rb
A B CPU GPU
0 2 2 2
1 2 3 3
2 2 4 4
3 2 5 5
4 2 6 6
5 2 7 7
6 2 8 8
7 2 9 9
8 2 10 10
9 2 11 11
See also:
Labels:
CUDA,
gpu,
programming,
Ruby
Tuesday, September 7, 2010
High Performance for All
Parallel programming is much more affordable now as multi-core CPU and programmable GPU become commodity products. Unlike a decade ago where a minimum dual socket system equipped with lower clocked CPU & RAM would relatively cost a fortune to a typical desktop user, but dual-core system is basically everywhere nowadays. The use of dual-core systems is not really because it's affordable, but simply the users have not given a choice for not going multi-core.
It was non-trivial to me a decade ago, why should I go with lower clocked CPU & RAM in order to go multi-processing? Isn't that will slow down all my applications that use only single core? Fortunately, this problem is now less severe with dynamic clock adjusting CPUs, so called turbo mode. We could enjoy the benefits of high clock speed and multiple cores for different applications.
Moving forward, does commodity products make HPC a commodity service? How is HPC doing in enterprise?
Checkout the report published by Freeform Dynamics: High Performance for All
It was non-trivial to me a decade ago, why should I go with lower clocked CPU & RAM in order to go multi-processing? Isn't that will slow down all my applications that use only single core? Fortunately, this problem is now less severe with dynamic clock adjusting CPUs, so called turbo mode. We could enjoy the benefits of high clock speed and multiple cores for different applications.
Moving forward, does commodity products make HPC a commodity service? How is HPC doing in enterprise?
Checkout the report published by Freeform Dynamics: High Performance for All
Labels:
cpu,
gpu,
hpc,
multi core
Subscribe to:
Posts (Atom)