About this Entry
Posted by: kenny_tm

Visit kenny_tm's Xanga Site

Original: 4/8/2008 7:48 AM
Views: 33323
Comments: 8
eProps: 6

Read Comments
Post a Comment
Back to Your Xanga Site


Who gave the eProps?
2 eProps!2 eProps! 2 eProps from:
heehee_yanyan
plvz1025
bigs7494


Tuesday, April 08, 2008

Parallel programming using OpenMP with Visual C++ 2008 Express

 

Parallel programming means to run several commands at the same time to reduce running time. With parallel programming, on a dual core machine theoretically a 2× improvement can be achieved. OpenMP is a simple way to get simple parallel programming work in C/C++ (and also FORTRAN).

Let's consider a simple program:

for (int i = 0; i < 1048576; ++ i)
  a[i] = sin(i);

If each assignment takes 1 second, the whole program needs 1 million seconds to complete! That's 11 days! But this for loop is vectorizable — all the assignment statements are independent of each other. If we have 1 million CPUs, we don't need to wait 1 million seconds — just distribute each commands to that 1 million CPUs, and the whole loop can be completed in 1 second.

Of course you don't have 1 million CPUs, but in modern computers a CPU has 2 or more cores that can work independently, so at least you can separate the loop into 2 parts and let each core compute them — together. That would reduce the running time to 5½ days, still a dramatic improvement, right? This is the basis of parallel program. We divide the labor to different computing devices and get a constant factor time reduction.

Loop vectorization is not always immediately possible. Suppose I change the program to

for (int i = 0; i < 1048576; ++ i)
  a[i] = a[i-1] + sin(i);

then it is impossible to use 1 million CPUs to compute the value simultaneously and return a correct answer, because each value a[i] depends on the previous one a[i-1] — they are not independent anymore. CPU #7478 must wait for CPU #7477 to compute the result. The whole process will still take 11 days.

Of course this program can be transformed to a vectorizable one. Note that it only computes , and that equals to , which the two sums are independent of each other. So we can still distribute the work to 2 CPUs and make the computation time down to 5½ days.

Enough theory! You have written a loop that is vectorizable. Then how to make the split the code into two and really let 2 CPUs handles them independently? Here comes OpenMP. You just need to add the statement before the for loop:

#pragma omp parallel for
for (int i = 0; i < 1048576; ++ i)
  a[i] = sin(i);

then the compiler will automatically handle all the difficulties for you!

Starting from MS Visual Studio 2005 OpenMP supported is added to the compiler, but it is only supported in the expensive Profession and Team System (Enterprise) versions. However, before you take out your credit card, allow me to get your attention that, in the free Express version, there is a compiler option for OpenMP (Configuration Properties → C/C++ → Language → OpenMP Support). And after all, all versions of VC share the same compiler. Let’s try to turn that on and see if the MSDN document is wrong… Well actually not. It complains VCOMP.lib is missing (I'll assume you are compiling in Release mode). Luckily it is just a file missing. You just need to grab the file and let VC knows where it is! The legal and free (free as in free beer :)) way to get VCOMP.lib is to install Windows SDK for Windows Server 2008 and .NET Framework 3.5. The VCOMP.lib will then reside in %PROGDIR%\Microsoft SDKs\Windows\v6.1\lib after installing it (This SDK also have some goodies like the x64 compilers which VC Express is missing.) You can now include this path in Configuration Properties → Linker → General → Additional Library Directories to let VC to find it.

So far so good. The program can now compile, yet when you run it, it complains again! It says VCOMP90.DLL is missing and then crashed. Well again the legal and free way to get this DLL is to install the Microsoft Visual C++ 2008 Redistributable Package (x86) (or x64), and then copy VCOMP90.DLL next to your program (EXE file). The DLL should be located in %WINDIR%\WinSxS\x86_microsoft.vc90.openmp_1fc8b3b9a1e18e3b_9.0.21022.8_none_ecdf8c290e547f39\vcomp90.dll. Now the program can run without crash! (This also shows the way to get 50% speed improvement without wasting a penny.)

Now let's do a little experiment. Compile the following code:

#pragma omp parallel for
for (int i = 0; i < 10; ++ i)
  printf("%d ", i);

In your first year programming class you would expect it prints 0123456789. Now run the program. You should not see 0123456789. At this moment it generated 0156237849. So what the heck is going on? Recall that with parallel programming the 2 (or more) CPUs are doing things simultaneously, so in fact they are simultaneously printing the numbers, and therefore the results mixed together. It demonstrates the importance of the content of a loop being vectorizable, otherwise you will get incorrect result.

Sometimes you don't want to parallel loops, but just two blocks are statements. Like summing sines of integers as above. You will need the #pragma omp parallel sections statment then. Look at the code below:

#pragma omp parallel sections
{
 for (int i = 0; i < 7; ++ i)
   printf("#  First block @ %d\n", i);
 #pragma omp section
 for (int j = 0; j < 7; ++ j)
   printf("? Second block @ %d\n", j);
 #pragma omp section
 for (int k = 0; k < 7; ++ k)
   printf("$  Third block @ %d\n", k);
}

When I run the program I get something like:

#  First block @ 0
#  First block @ 1
#  First block @ 2
? Second block @ 0
? Second block @ 1
#  First block @ 3
#  First block @ 4
? Second block @ 2
? Second block @ 3
#  First block @ 5
#  First block @ 6
? Second block @ 4
? Second block @ 5
$  Third block @ 0
$  Third block @ 1
? Second block @ 6
$  Third block @ 2
$  Third block @ 3
$  Third block @ 4
$  Third block @ 5
$  Third block @ 6

Each block is running sequentially yet the 3 blocks are mixed together, showing the parallelism. Of course you can split the sine summation into 2 parts and divide into sections, but if there are more than 2 CPUs around, you will have resources wasted. The proper way is to use

#pragma omp parallel for reduction (+:total) schedule(static)
for (int i = 0; i < 800000; ++ i)
   total += sin(i);

The reduction clause handles the summation splitting part for you. In principle, after the faster CPUs completed their sums, they will wait for the slower ones and at the end of the loop all parts will be summed together. (The shedule(static) clause states how the loop should be divided. Google for detail.)

Parallel programming is a deep topic. Once you get pass the simplicity of paralleling for loops and independent blocks, you will get in touch with the dirt of critical sections and atomicity and race condition and spirits and ghosts. (And I must rant here that the const and invariant madness in D programming language is again because the language designer just want to support multiprogramming). In scientific computation I believe the above introduction is enough for most area — if you intend for running the program in a single PC. If the program needs to coordinate data and computation from several linked computers (i.e. a cluster) automatically in the code level OpenMP can't help. (I will suggest running several copies of the program on those machines and later merge the data.)

References:

Note:

  1. Sometimes you don't need to get OpenMP to make your program faster. Are you compiling in Release mode? Have you turned on Optimization (Look in C/C++ → Optimization)? If you are targeting modern CPUs, have you turned on support for SSE2 instructions (C/C++ → Code Generation → Enable Enhanced Instruction Set)? More importantly, is your algorithm optimized?
  2. If you want to use OpenMP in Debug mode as well, include vcomp.lib in the Additional Dependencies and set Ignore Specific Library with vcompd.lib as specified in http://www.remoteplace.net/~kmt-t/php/upup/img/042.png.
  3. There is a library omp.h, which you don't need to include unless you want some runtime info like the number of threads it is running on, etc.
  4. If you dislike Microsoft stuff, or you are using Unix or Mac OS (X), yes, the free and open GCC 4.2 supports OpenMP as well. Note that since the latest (as of 2008) MinGW and Cygwin uses GCC 3.4 only, all developer systems depending on it including Dev-C++ will not support OpenMP. Switch to something else.
 Posted 4/8/2008 7:48 AM - 33323 Views - 6 eProps - 8 comments

Give eProps or Post a Comment

8 Comments

Visit heehee_yanyan's Xanga Site!
今早在barn a 見甘仔,以為佢睇緊notes la
haha,佢話睇緊xanga

你xanga 愈來愈wiki la^^
Posted 4/9/2008 1:30 AM by heehee_yanyan - recommend - reply

Visit kenny_tm's Xanga Site!

@heehee_yanyan - Wiki won't use exclamation marks that frequently :p

Posted 4/9/2008 3:42 PM by kenny_tm Xanga True Member - recommend - reply

I have installed «Windows SDK for Windows Server 2008 and .NET Framework 3.5». But there is no VCOMP.lib there :(
(I did't install documentation, samples, and mobile tools)
Posted 3/9/2009 3:35 AM by Anonymous - recommend - reply

I have installed «Windows SDK for Windows Server 2008 and .NET Framework 3.5». But there is no VCOMP.lib there :(
(I did't install documentation, samples, and mobile tools)
Posted 3/9/2009 3:37 AM by Anton (site) - recommend - reply

@Anton - 



Same here, it looks like they removed it. Is it (legally) available elsewhere?
Posted 5/8/2009 3:35 AM by gl - recommend - reply

I have found, that if you install SDK AFTER installing the Visual Studio, then vcomp.lib is installed directly into the Visual Studio folder. So, there is no problem with it.
Posted 5/15/2009 12:54 PM by Anton (site) - recommend - reply

I, too, have installed the SDK.  A search of everywhere on my harddrive shows that the vcomp.lib was not installed.  It seems it has been removed from the installation pack.  Has anyone been able to work out an alternative?
Posted 5/19/2009 11:52 AM by Bryn - recommend - reply

To fix the vcomp90.dll missing problem, add the following line of code:
#include
The presence of this line will cause the build tools to add vcomp90.dll to the program's manifest, allowing Windows to find it.
Posted 6/3/2009 5:58 AM by William - recommend - reply


Sign in to CommentChoose Identity
Give eProps (?)
Post a Comment
Add Link | Preview HTML comment help 
Profile Pic:
Default  |  Choose »  (?)

(?)

Back to kenny_tm's Xanga Site!
Note: your comment will appear in kenny_tm's local time zone:
GMT +08:00 (China Coast)

Creative Commons License
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 License.