pararel version of VASP installation issue

Questions regarding the compilation of VASP on various platforms: hardware, compilers and libraries, etc.


Moderators: Global Moderator, Moderator

Locked
Message
Author
felixvasp

pararel version of VASP installation issue

#1 Post by felixvasp » Sun Sep 09, 2012 1:21 pm

Hi everyone,
I encountered some issue installation of VASP/5.2.12.NOV. The source code was successfully compiled, but when I ran it, it just work for the case of job running on one node (8 cores), in other words , if I ran with 2 nodes (8 core each node), it runs abnormally since after ten minutes it didn't start scf yet but for the same job it was done within ten minutes using 1 node. It is very strange to me since I don't have much exp. on compilation of parallel codes.
The compilation was done with intel/11.1.056 compiler, openmpi-intel, and fftw-3.2.2. Doesn't any know why it not working for multi-nodes? Any suggestions or thoughts will be highly appreciated.
Last edited by felixvasp on Sun Sep 09, 2012 1:21 pm, edited 1 time in total.

peterklaver
Newbie
Newbie
Posts: 31
Joined: Thu Apr 21, 2005 9:28 am
Location: Netherlands
Contact:

pararel version of VASP installation issue

#2 Post by peterklaver » Wed Sep 19, 2012 3:53 pm

Hi felix,

Have you or others successfully run mpi codes other than VASP across multiple nodes? While I can't make out very well what the problem would be from what you describe, it wouldn't surprise me if the problem lies in your mpi installation, rather than specifically in VASP on your cluster.
Last edited by peterklaver on Wed Sep 19, 2012 3:53 pm, edited 1 time in total.

felixvasp

pararel version of VASP installation issue

#3 Post by felixvasp » Fri Sep 21, 2012 7:23 pm

hi, peter,
Thanks for reply.
I guess MPI runs fine since I was compiling VASP on a HPC of our univ and there are some parallel version of programs there. Do you think it is due to high demanding of bandwidth for inter-node communication? So if I changes -DMPI-Block to a bigger number, it seems to speed up a little bit but still slow than running on the same node with multiple cores. I also tried this on a HPC with a so-called infiniband and with the same makefile settings, the parallel version of VASP is much faster. But this doesn't rule out the possibility of incorrectly-installed MPI. So if this MPI problem does exist, do you know any way to test it? Or can you direct me to some source on the web for that? Thanks a lot. Have a nice day.


Felix

[quote="peterklaver"]Hi felix,

Have you or others successfully run mpi codes other than VASP across multiple nodes? While I can't make out very well what the problem would be from what you describe, it wouldn't surprise me if the problem lies in your mpi installation, rather than specifically in VASP on your cluster.[/quote]
Last edited by felixvasp on Fri Sep 21, 2012 7:23 pm, edited 1 time in total.

peterklaver
Newbie
Newbie
Posts: 31
Joined: Thu Apr 21, 2005 9:28 am
Location: Netherlands
Contact:

pararel version of VASP installation issue

#4 Post by peterklaver » Sat Sep 22, 2012 9:56 am

Hi felix,

Running on a single node will very likely be faster in most cases, as communication within one node is near-instant, so there is virtually no communication delay. What you describe seems to fit in with a strong communication bottleneck between nodes (better on infiniband etc. I have no experience varying the -DMIP-Block pre-processor option though, no idea about that one).

The run you mentioned in your initial post must have been fairly small, as it finished in 10 minutes on a single node. With so little cpu time required, communication may weigh relatively heavily on the total time required. You could try a very big system instead (but make sure your calculation can still easily fit into the RAM, if the node starts swapping data to disk your result would be unrealistically slow, use the makeparam utility bundled with VASP to check memory requirements). For a big system of many atoms the communication bottleneck should be less important, so the time required on multiple nodes should improve compared to running on one node.

If that does happen, then your VASP installation may be fine as it is. And you should then probably just use parallel runs across multiple nodes for big jobs, where multiple nodes are really useful, and do small jobs just on one node.
Last edited by peterklaver on Sat Sep 22, 2012 9:56 am, edited 1 time in total.

Locked