tag:blogger.com,1999:blog-47181190772202041082024-03-05T21:30:18.017+08:00SpeedGo ComputingSpeed breaking computational problems with multi-core CPU and GPUxmanhttp://www.blogger.com/profile/05695636905017529897noreply@blogger.comBlogger24125tag:blogger.com,1999:blog-4718119077220204108.post-27210074604986960702013-07-04T21:41:00.000+08:002013-07-04T21:41:26.354+08:00[Resolved] Skype crashed on Fedora 19
Running Skype on Fedora 19 gets core dumped immediately.
By luck I found a workaround using mesa-libGL library from Fedora 17.
Steps:
Download the mesa-libGL-8.0.4-1.fc17.i686.rpm from Fedora, or get it here.
Extract the rpm file.
$ rpm2cpio mesa-libGL-8.0.4-1.fc17.i686.rpm | cpio -idv
Run skype with the library.
$ LD_LIBRARY_PATH=usr/lib /usr/bin/skype
That seems working xmanhttp://www.blogger.com/profile/05695636905017529897noreply@blogger.com3tag:blogger.com,1999:blog-4718119077220204108.post-452391657056995442012-06-18T14:50:00.001+08:002012-06-18T17:31:41.383+08:00Personal Supercomputing System with Quad GPUsThe secrets of the new Kepler GPUs have been revealed. The Kepler based graphics cards have been studied extensively for gaming performance. Most would suggest you don't need the upgrade. Furthermore, the supported PCI-E 3.0 is of little to no use.
Well, it's probably a different story for CUDA programs. Here's the setup I'm going to use for testing CUDA programs extensively.
Asus P8Z99-V xmanhttp://www.blogger.com/profile/05695636905017529897noreply@blogger.com0tag:blogger.com,1999:blog-4718119077220204108.post-2951069984930682352011-06-26T08:29:00.000+08:002011-06-26T08:29:55.592+08:00Being Nvidia CUDA Certified Programmer!It takes some courage and effort to take the Nvidia CUDA Certification exam. You'll have to pay S$350 for that yet there is no guarantee of real use in business and career. The exam questions are perfect to squeeze out all your brain juice.
After much feedback and long awaiting, delayed plans, finally I received an email about being Nvidia CUDA certified programmer now. It's better arrived late xmanhttp://www.blogger.com/profile/05695636905017529897noreply@blogger.com1tag:blogger.com,1999:blog-4718119077220204108.post-65000959931664249932011-05-09T21:03:00.000+08:002011-05-09T21:03:13.775+08:00The Choice is Yours: CUDA in C++ or Ruby
See the output here: Ruby Query Output
See the output here: C++ Query Outputxmanhttp://www.blogger.com/profile/05695636905017529897noreply@blogger.com0tag:blogger.com,1999:blog-4718119077220204108.post-29086641338895385282011-05-03T17:44:00.000+08:002011-05-03T17:44:28.418+08:00Web Seminar: Programming GPUs Beyond CUDAGPU/CUDA programming is easy if we ignore the performance, or even the correctness of the program. It becomes tough when the performance is critical, one has to optimize very hard on the specific hardware. Fortunately, GPU hardware performance improves drastically every 2 years. Unfortunately, the performance is not portable across different generations of GPUs.
Prof Chen from Tshing Hua xmanhttp://www.blogger.com/profile/05695636905017529897noreply@blogger.com1tag:blogger.com,1999:blog-4718119077220204108.post-2664622789339996722011-04-30T17:56:00.003+08:002011-05-02T15:51:10.967+08:00First Release of SGC Ruby CUDA - Beginning of a long way pathToday we decided to put up the first release of the SGC Ruby CUDA v0.1.0 as a mean to attract Rubyists to try out GPU programming as their new toy projects, and also to encourage HPC developers to evaluate if Ruby is good to use for their HPC applications.
When important software libraries are not available in Ruby, we certainly do not expect much Ruby usage in the area. As time is running shortxmanhttp://www.blogger.com/profile/05695636905017529897noreply@blogger.com2tag:blogger.com,1999:blog-4718119077220204108.post-1772480290629265482011-04-24T10:32:00.001+08:002011-04-24T10:32:58.165+08:00GPU Computing with RubyPresented in RedDotRubyConf 2011 - PechaKucha Night Singapore.GPU Computing with RubyView more presentations from myxman.xmanhttp://www.blogger.com/profile/05695636905017529897noreply@blogger.com2tag:blogger.com,1999:blog-4718119077220204108.post-25386371278404565122010-11-19T21:16:00.003+08:002010-11-24T12:32:15.786+08:00Using SGC-Ruby-CUDA on the Newly Launched Amazon EC2 Cluster GPUWonder if GPU works for you? No budget for a system with decent GPU? Installations and configurations are too much trouble for you? You can now try out SGC-Ruby-CUDA on Amazon EC2 with the system image, located at US East Virginia zone, called SGCRubyCUDA.1 which is available as a community AMI.
Compile for rubycu shared library and run tests.
[root@ip-10-17-130-174 sgc-ruby-cuda.git]# rake
(inxmanhttp://www.blogger.com/profile/05695636905017529897noreply@blogger.com6tag:blogger.com,1999:blog-4718119077220204108.post-28730076046116791522010-11-16T10:35:00.000+08:002010-11-16T10:35:13.076+08:00GPU Anywhere with Cloud ComputingSimulation taking months to run? Buying and maintaining new systems causing too much hassle? Perhaps Cluster GPU would be a good candidate to save time and trouble. Cloud solution is an excellent platform for proof of concept before committed to a large system in-house.
Paying $2.10 per hour (Amazon pricing as of 16 Nov 2010) for the spec of:
22 GB of memory
33.5 EC2 Compute Units (2 x Intel xmanhttp://www.blogger.com/profile/05695636905017529897noreply@blogger.com0tag:blogger.com,1999:blog-4718119077220204108.post-27930877371734295402010-09-26T08:10:00.001+08:002010-10-17T12:40:19.811+08:00Parallel programming knowledge is must-have skill for Wall StreetParallel programming knowledge is must-have skill for Wall Streetxmanhttp://www.blogger.com/profile/05695636905017529897noreply@blogger.com0tag:blogger.com,1999:blog-4718119077220204108.post-67504396343581972612010-09-17T21:58:00.001+08:002010-10-17T12:41:08.053+08:00Unigine crew: CUDA vs OpenCL vs SPU Part IVWhich language or library you choose to use for your software development has great and prolong impact to the software. Just come across a simple yet interesting benchmark. Perhaps, more details on why such numbers are obtained would be even more enlightening.Unigine crew: CUDA vs OpenCL vs SPU Part IVxmanhttp://www.blogger.com/profile/05695636905017529897noreply@blogger.com0tag:blogger.com,1999:blog-4718119077220204108.post-30536675356875879782010-09-17T10:46:00.005+08:002010-10-23T09:45:04.125+08:00CUDA Programming with RubyNeed GPU computing power in your Ruby program? Great! SpeedGo Computing is developing Ruby bindings for CUDA, called sgc-ruby-cuda. Take advantage of your Nvidia CUDA-enabled graphics cards with Ruby now.Currently, only part of the CUDA Driver API is included. More components such as the CUDA Runtime API will be included to make it as complete as possible.CUDA Programming with Rubyrequire 'xmanhttp://www.blogger.com/profile/05695636905017529897noreply@blogger.com6tag:blogger.com,1999:blog-4718119077220204108.post-38300942850849078402010-09-07T19:23:00.003+08:002010-10-17T12:34:42.179+08:00High Performance for AllParallel programming is much more affordable now as multi-core CPU and programmable GPU become commodity products. Unlike a decade ago where a minimum dual socket system equipped with lower clocked CPU & RAM would relatively cost a fortune to a typical desktop user, but dual-core system is basically everywhere nowadays. The use of dual-core systems is not really because it's affordable, but xmanhttp://www.blogger.com/profile/05695636905017529897noreply@blogger.com1tag:blogger.com,1999:blog-4718119077220204108.post-52267027739698792842010-08-25T12:38:00.020+08:002010-10-17T12:35:06.454+08:00AMD’s Bulldozer vs Intel's Hyper-Threading?AMD's so called Strong Thread approach in the Bulldozer module is that really compelling?Extra cores are added when a processor can't operate at a faster clock speed, that's a good and easy way to expand a product line with effectively faster products, even though it may NOT be any faster depending on whether the applications are taking advantage of the multiple cores. But fully duplicating x86 xmanhttp://www.blogger.com/profile/05695636905017529897noreply@blogger.com4tag:blogger.com,1999:blog-4718119077220204108.post-36308545478079921042010-08-17T12:41:00.005+08:002010-10-17T12:30:34.805+08:00Parallelizing Matrix Multiplication using MPIMPI is a popular mechanism in high performance computing. It works for both cluster and shared memory environment. Why don't we simply use MPI when it works for both environments? Why do we care about OpenMP? Cilk++? etc. Perhaps that depends on the complexity of the applications you are dealing with.Parallel Matrix Multiplication using MPI/* matrix-mpi.cpp */#include <mpi.h>const int size xmanhttp://www.blogger.com/profile/05695636905017529897noreply@blogger.com4tag:blogger.com,1999:blog-4718119077220204108.post-36897303119574955312010-08-15T14:13:00.004+08:002010-10-17T12:30:35.695+08:00Parallelizing Matrix Multiplication using TBBParallelizing matrix multiplication using TBB isn't too difficult. It's just a little more work than OpenMP or Cilk++.Parallel Matrix Multiplication using TBB/* matrix-tbb.cpp */#include <tbb/parallel_for.h>#include <tbb/blocked_range.h>using namespace tbb;const int size = 1000;float a[size][size];float b[size][size];float c[size][size];class Multiply{public: void operator()(xmanhttp://www.blogger.com/profile/05695636905017529897noreply@blogger.com10tag:blogger.com,1999:blog-4718119077220204108.post-70311473840882284712010-08-15T11:01:00.011+08:002010-10-17T12:30:36.605+08:00Parallelizing Matrix Multiplication using Cilk++ in Two LinesFollowing the parallelization of matrix multiplication using OpenMP in Parallelizing Matrix Multiplication using OpenMP in One Line, can we do the same using Cilk++?Parallel Matrix Multiplication using Cilk++/* matrix.cilk */const int size = 1000;float a[size][size];float b[size][size];float c[size][size];int cilk_main(){ // Initialize buffers. for (int i = 0; i < size; ++i) { for (xmanhttp://www.blogger.com/profile/05695636905017529897noreply@blogger.com0tag:blogger.com,1999:blog-4718119077220204108.post-47243602512463609072010-08-14T22:29:00.006+08:002010-10-17T12:30:37.964+08:00Parallelizing Matrix Multiplication using OpenMP in One LineMatrix multiplication is often used for academic study. It's well suited for parallelization due to its intensive O(N^3) computation and independent computation. Parallel programming is hard. Does it surprise you if we parallelize matrix multiplication in merely one line of OpenMP directive?Serial Matrix Multiplication/* matrix.cpp */const int size = 1000;float a[size][size];float b[size][size];xmanhttp://www.blogger.com/profile/05695636905017529897noreply@blogger.com8tag:blogger.com,1999:blog-4718119077220204108.post-40233929988064971942010-08-11T18:14:00.013+08:002010-10-17T12:30:39.007+08:00Parallel Programming - Hello WorldMany computer science/engineering students learn to write Hello World program at their first programming lecture. What's your first parallel program? What about Hello World program in OpenMP, MPI, Cilk++, TBB, Ruby thread, PThread?Hello World in C/* hello.c */#include <stdio.h>int main(){ printf("hello world\n"); return 0;}$ gcc hello.c -o hello$ ./hellohello worldHello World in xmanhttp://www.blogger.com/profile/05695636905017529897noreply@blogger.com1tag:blogger.com,1999:blog-4718119077220204108.post-13513788766054593102010-07-31T02:21:00.013+08:002010-10-17T12:30:40.045+08:00Parallel Programming - What Are The Options?There are simply way too many parallel programming languages and libraries to keep track of. Many of them are no longer active in development, or difficult to get them working in decent operating systems. What are the practical options currently available for multi-core CPU or GPU?OpenMPHardware: Shared memory multi-core CPU system.Parallelization: Use directives e.g. #pragma omp parallel {} in Cxmanhttp://www.blogger.com/profile/05695636905017529897noreply@blogger.com2tag:blogger.com,1999:blog-4718119077220204108.post-20540274096626843402010-07-29T16:10:00.002+08:002010-10-17T12:30:41.099+08:00Who Is Responsible For The Programming Of Multi Core CPU And GPU?Multi core CPU and GPU are now commodity products. But, where are the software that could take advantage of their parallel architecture? Who should be developing such software? The domain expert? HPC (high performance computing) sofware engineer? or parallel programming tools such as auto parallelizing compilers?Domain experts typically do not wish to spend too much time on computing problems. xmanhttp://www.blogger.com/profile/05695636905017529897noreply@blogger.com1tag:blogger.com,1999:blog-4718119077220204108.post-58513528558621659932010-07-28T17:33:00.007+08:002010-10-17T12:36:32.190+08:00Why Can't Compilers Auto Parallelize Serial Code Effectively?An auto parallelizing tool takes in a serial code base in C/C++/Fortran etc. and produces parallel version of the code. For instance, specifying -parallel option at Intel compiler compilation produces parallelized binary with OpenMP runtime. MIPSpro compiler provides similar auto parallelizing function with -apo option, where you can view the code transformation which consists of SGI OpenMP xmanhttp://www.blogger.com/profile/05695636905017529897noreply@blogger.com0tag:blogger.com,1999:blog-4718119077220204108.post-55049481598302103052010-07-22T12:14:00.007+08:002010-10-17T12:30:43.375+08:00Where Are All The Practical Parallel Algorithms and Libraries?Multi-core CPU and GPU are everywhere nowadays from laptops to desktops to high-end computing clusters. Is your particular application running any faster? Nope. But generally you need parallel algorithms for an application to make full use of the multiple cores.Perhaps you'll expect doing some searches on the web, research publications and academic books would provide you all the state of art xmanhttp://www.blogger.com/profile/05695636905017529897noreply@blogger.com0tag:blogger.com,1999:blog-4718119077220204108.post-54826035935898175502010-07-21T03:14:00.020+08:002010-10-17T12:30:44.819+08:00Why Is Parallel Programming Difficult?Parallel programming is generally perceived as an activity only for people going after high tech, bleeding edge research. It is difficult and alien enough to drive most software engineers away, whether it is really the case or merely their misconceptions. The fact is, software engineers run away from parallel programming while modern general purpose processors consist more and more multiple coresxmanhttp://www.blogger.com/profile/05695636905017529897noreply@blogger.com1