Sunday, August 15, 2010

Parallelizing Matrix Multiplication using Cilk++ in Two Lines

Following the parallelization of matrix multiplication using OpenMP in Parallelizing Matrix Multiplication using OpenMP in One Line, can we do the same using Cilk++?

Parallel Matrix Multiplication using Cilk++

/* matrix.cilk */
const int size = 1000;

float a[size][size];
float b[size][size];
float c[size][size];

int cilk_main()
// Initialize buffers.
for (int i = 0; i < size; ++i) {
for (int j = 0; j < size; ++j) {
a[i][j] = (float)i + j;
b[i][j] = (float)i - j;
c[i][j] = 0.0f;

// Compute matrix multiplication.
// C <- C + A x B
cilk_for (int i = 0; i < size; ++i) {
for (int j = 0; j < size; ++j) {
for (int k = 0; k < size; ++k) {
c[i][j] += a[i][k] * b[k][j];

return 0;
The cilk_main() is the entry function synonymous to main() in a typical C++ program, but cilk_main() function uses Cilk++ linkage such that you can spawn and sync in the function. This update is trivial.Effectively, there is only one line of non-trivial update to parallelize the serial matrix multiplication.
$ g++ -O2 matrix.cpp -o matrix
$ cilk++ -O2 matrix.cilk -o matrix-cilk
$ time ./matrix
real 0m12.613s
user 0m12.392s
sys 0m0.030s
$ time ./matrix-cilk
real 0m6.775s
user 0m12.591s
sys 0m0.134s
The performance improvement is encouraging. You can try out with quad-core or hex-core if you have the machine.

No comments:

Post a Comment