__Parallel Matrix Multiplication using TBB__

/* matrix-tbb.cpp */We've moved the computation of the matrix multiplication into the#include <tbb/parallel_for.h>

#include <tbb/blocked_range.h>using namespace tbb;

const int size = 1000;

float a[size][size];

float b[size][size];

float c[size][size];

class Multiply

{

public:

void operator()(blocked_range<int> r) const {

for (int i = r.begin(); i != r.end(); ++i) {

for (int j = 0; j < size; ++j) {

for (int k = 0; k < size; ++k) {

c[i][j] += a[i][k] * b[k][j];

}

}

}

}

};

int main()

{

// Initialize buffers.

for (int i = 0; i < size; ++i) {

for (int j = 0; j < size; ++j) {

a[i][j] = (float)i + j;

b[i][j] = (float)i - j;

c[i][j] = 0.0f;

}

}

// Compute matrix multiplication.

// C <- C + A x B

parallel_for(blocked_range<int>(0,size), Multiply());

return 0;

}

*class Multiply*which takes in the range of i-iterations to work on. The

*parallel_for*internally split the range [0,size) into multiple blocks. Multiple workers can then work on different non-overlapping blocks in parallel.

Once the computation is organized into functions which can dynamically work on different parts of the computation, it's relatively easy to proceed.

$ g++ -O2 matrix.cpp -o matrix

$ g++ -O2 matrix-tbb.cpp -ltbb -o matrix-tbb

$ time ./matrix

real 0m12.971s

user 0m12.489s

sys 0m0.052s

$ time ./matrix-tbb

real 0m7.857s

user 0m12.734s

sys 0m0.282s

You can use blocked_range2d, that means you have both for loops from tbb. It performs better.

ReplyDeleteYes. You are right. This post is more to complement other posts in OpenMP, MPI, and Cilk++

ReplyDeleteThank you for providing this code. I have tested it. But on my pc (intel Due CPU P8400), the time for matrix-tbb and matrix are almost the same. Even, the time of matrix-tbb is greater than matrix. Do you know why? Thank you.

ReplyDeleteIt's easier to test with larger matrix size such as 2000 x 2000. The 1000 x 1000 matrix might be running too fast on your system.

ReplyDeleteFurthermore, the CPU dynamic clock might disrupt the running time readings.

thnx for this....

ReplyDeleteWhat causes the user & system statistic to increase?

ReplyDeleteHello. I`ve tried this code, using MVS 2012 Prof.

ReplyDeleteand got this:

error : identifier "blocked_range" is undefined

error : identifier "parallel_for" is undefined

Screenshot is here: http://lbex.narod.ru/strange_err.jpg

include tbb.h and use the tbb.dll or tbb_debug.dll

ReplyDelete