VASP scalability on Nahelem with Gigabit interconnection

Message

Brane · #1 Post by **Brane** » Tue Oct 20, 2009 4:36 pm

Dear All,

We recently bought some two-quad core Nehalem X5550 machines to run VASP 4.6. The Nehalem machines are connected via normal Gigabit Ethernet. We used Centos, ifort 11.1.038, and intel MKL 10.2.2.025. The mpi library we used is openmpi-1.3.3. We are happy to see that VASP runs very fast using 8 cores in a single Nehalem box. However, we are very frustrated to find the scalability cross two or more Nehalem boxes is very bad, i.e. VASP doesnâ€™t scale at all cross boxes. The walltime vs. number of cores for a test system (a 2x2 surface) is shown below:

Cores Time (second)
8 85.78
16 116.93
24 245.06
32 261.96

These results surprised me. Actually, VASP scales pretty well on our AMD Barcelona clusters with Gigabit interconnection.

I am wondering if it is a must to use a low-latency interconnection such as infiniband to get a good scalability on Nehalem cross different boxes.

I appreciate it if anybody can share me some experience to run VASP efficiently on Nehalem with Gigabit interconnection.

Thanks.

pafell · #2 Post by **pafell** » Fri Oct 23, 2009 7:28 am

Have you tried running the same test on single core? What is the actual impact of using 8 cores in single node? Those times are for single iteration? If not, than compare single iteration.
If you have small test system, you send lots of informations between hosts, and calculations take less time than communication between nodes (MPI). If I'm trying to compare new computer to one of "the old ones", I use test case where single iteration runs about 1000 seconds. It makes me sure that mpi-impact is relatively small, and test is similar to jobs actually run in near future.
Also comparison of a lot slower cpu with fast nehalem with mpi makes small sense for small test cases - due to previous argument.

Sorry for little off-topic, but I believe you should run again test case and your results won't be such a drama.

Brane · #3 Post by **Brane** » Mon Oct 26, 2009 3:49 am

Hi Pafell,

Thanks for your comments.

As you can see in my previous post, I am interested in the scalability on Nehalem with Gb Ethernet connection. My tests showed VASP scales very well within a single Nehalem box. A test job with 8 cores in a single box is about 6 times faster than that running on a single core. However, when job running across two boxes, the walltime increases and VASP doesn't scale. The time I gave above is the walltime for an ionic step (i.e., 'grep LOOP+ OUTCAR'). You may question that my test system is small and the MPI communication between boxes is too much. However, I also tested a big system and I got the same result. Meanwhile, I also ran the same benchmarks in a small test Infiniband-connected Nehalem boxes and found that VASP scales pretty good. The problem is that Infiniband is too expensive and we may have to end up with Gb Ethernet connection. That is why I very much like to see a good scalability on Nehalem with Gb Ethernet connection, as it was the case on AMD Barcelona boxes.

Thanks.
<span class='smallblacktext'>[ Edited Mon Oct 26 2009, 05:22AM ]</span>

alex · #4 Post by **alex** » Mon Oct 26, 2009 7:56 am

Hello Brane,

one way to overcome heavy (and slow) communication is to increase NPAR. You'll do more numerics with one process but less communication. Typically Gb ethernet profits from that.

Cheers

Alex