Saturday, July 31, 2010

Parallel Programming - What Are The Options?

There are simply way too many parallel programming languages and libraries to keep track of. Many of them are no longer active in development, or difficult to get them working in decent operating systems. What are the practical options currently available for multi-core CPU or GPU?
  1. OpenMP
    • Hardware: Shared memory multi-core CPU system.
    • Parallelization: Use directives e.g. #pragma omp parallel {} in C/C++/Fortran to parallelize loops or code regions.
    • Supported by decent compilers.
    • Non-supporting compilers ignore the directives and compile as serial program.
    • Very good for incremental parallelization.
  2. Cilk++
    • Hardware: Shared memory multi-core CPU system.
    • Parallelization: Use new keywords in C++ namely cilk_spawn to invoke a Cilk linkage function asynchronously, cilk_sync to synchronize with locally spawned functions, cilk_for to parallelize a for-loop.
    • The Cilk++ runtime system takes care of the thread scheduling which ease nested parallelization tremendously and maintain certain level of efficiency.
    • Requires Cilk++ compiler and Cilk++ runtime system.
    • Very good for parallelizing dynamic codes with low overhead.
  3. TBB
    • Hardware: Shared memory multi-core CPU system.
    • Parallelization: C++ function objects or C++0x lambda expressions as work units, parallelizing with template functions e.g. parallel_do, parallel_for, parallel_reduce, parallel_pipeline, etc. Concurrent storage classes e.g. concurrent_vector are also provided.
    • Portable to multiple platforms which have good C++ supports.
    • Uses C++ template and function object extensively. C++ beginners might have difficulty to read/write the codes.
    • Allow many customization options at task level which can be complicating and messy, but threads are abstracted, i.e. thread scheduling is taken care of.
    • Recommended only for heavy C++ users.
  4. PThread or thread library built into languages
    • Hardware: Shared memory multi-core CPU system.
    • Parallelization: Provides a library of functions to create, destroy, synchronize threads.
    • Pthread is well supported on Unix/Linux systems, but Windows would require external library.
    • Low level and explicit manipulations of threads.
    • Not recommended for general parallel programming tasks.
  5. OpenCL
    • Hardware: Shared memory multi-core CPU system or OpenCL supported GPU.
    • Parallelization: Provides a library of functions to massively execute a kernel function on a supported device.
    • Supported by ATI Stream SDK and Nvidia OpenCL SDK.
    • Requires OpenCL runtime support for the targeted devices.
    • Well suited for data parallel or streaming computation.
    • Not recommended for direct use for general parallel programming, use wrappers for OpenCL instead.
  6. CUDA
    • Hardware: CUDA enabled Nvidia GPU.
    • Parallelization: Provides a kernel invocation method to massively execute a kernel function on a CUDA enabled Nvidia GPU. The invocation method requires CUDA compiler to parse its special syntax in the form kernel_method<<<grid_dim, block_dim,shared_mem_size,stream>>>.
    • Supported by Nvidia CUDA SDK.
    • Requires CUDA compiler and CUDA runtime system.
    • Well suited for data parallel or streaming computation.
    • The CUDA programming guide is well documented for the requirements to achieve good performance with CUDA enabled Nvidia GPU.
    • Recommended for gpu programming on Nvidia GPU.
  7. Brook+
    • Hardware: Shared memory multi-core CPU system or CAL supported ATI GPU.
    • Parallelization: Allow specification of kernel function that accepts streams of data. A kernel function is invoked as per normal function. The specification of a kernel function requires Brook+ compiler to parse the syntax of the kernel function.
    • Supported by ATI CAL and x86 CPU backend.
    • Requires Brook+ compiler and Brook+ stream runtime system.
    • Well suited for data parallel computation.
    • AMD has been promoting the use of OpenCL for ATI GPU programming. Brook+ is open sourced, however, its development is no longer active.
  8. MPI
    • Hardware: Shared memory multi-core CPU system or cluster of computers.
    • Parallelization: Provides a library of functions for message passing between processes i.e. point-to-point and collective communications.
    • Supported by third party library such as MPICHOpenMPI, etc.
    • Requires communication runtime system.
    • Low level manipulations of buffers and process-process communications.
    • Very popular for programming HPC cluster, but not recommended for general parallel programming.
  9. PVM
    • Hardware: Shared memory multi-core CPU system or distributed systems.
    • Parallelization: Provides a library of functions for message passing between tasks.
    • Supported by third party library such as Netlib PVM3.
    • Use standard network interface such as TCP/IP for higher interoperability over a distributed systems.
    • Low level manipulations of buffers and task-task communications.
  10. Charm++
    • Hardware: Shared memory multi-core CPU system or distributed systems.
    • Parallelization: Object-oriented C++ working units where working units called chares may communicate with other chares using proxy objects.
    • Scheduling computations based on availability of data.
    • Requires Charm++ compiler and Charm++ runtime system.
  11. uC++
    • Hardware: Shared memory multi-core CPU system.
    • Parallelization: Provides C++ coroutines for independent executions.
    • The runtime system performs scheduling of virtual processor using OS kernel threads.
    • Requires uC++ compiler and uC++ kernel.

2 comments:

  1. Have you tried .NET Parallel Extensions? Its development seems to be quite aggressive.

    You might also want to check out Chapel, X10, and Fortress which are being developed by Cray, IBM, and Sun respectively.

    ReplyDelete
  2. I haven't tried the .NET Parallel Extensions. It's supports for task parallel looked good for dynamic parallelism.

    The Chapel, X10, and Fortress are interesting. There are already some releases available to try out.

    ReplyDelete