Page 1 of 1

Problems with MPI vasp at runtime

Posted: Thu Jun 08, 2006 7:45 pm
by brockp
I am a sysadmin helping a user install vasp on our linux (RHEL 4.0) opteron cluster. The compiler is PGIF90 6.1 and the MPI lib is OpenMPI 1.0.2 the serial version builds and runs just fine but the paralell version gives the following error,

running on 2 nodes
[] *** An error occurred in MPI_Cart_create
[] *** on communicator MPI_COMM_WORLD
[] *** MPI_ERR_OTHER: known error not in list
[] *** MPI_ERRORS_ARE_FATAL (goodbye)
[] *** An error occurred in MPI_Cart_create
[] *** on communicator MPI_COMM_WORLD
[] *** MPI_ERR_OTHER: known error not in list
[] *** MPI_ERRORS_ARE_FATAL (goodbye)
1 additional process aborted (not shown)

This is a regular OMPI error, and i have contacted the devs of openmpi, i am posting here to see if this is a problem anyone else has seen and if so how/if they were able to fix this problem.


Problems with MPI vasp at runtime

Posted: Fri Jun 09, 2006 2:03 pm
by brockp
Looks like the problem isnt with openMPI here is the result rebuilding everythign with lam-7.1.2

bash-3.00$ mpirun -np 2 ./vasp
running on 2 nodes
MPI_Cart_create: invalid dimension argument: Invalid argument (rank 0, MPI_COMM_WORLD)
Rank (0, MPI_COMM_WORLD): Call stack within LAM:
Rank (0, MPI_COMM_WORLD): - MPI_Cart_create()
Rank (0, MPI_COMM_WORLD): - main()
MPI_Cart_create: invalid dimension argument: Invalid argument (rank 1, MPI_COMM_WORLD)
Rank (1, MPI_COMM_WORLD): Call stack within LAM:
Rank (1, MPI_COMM_WORLD): - MPI_Cart_create()
Rank (1, MPI_COMM_WORLD): - main()

Could it be a problem with the users input that the problem cant be broken down correctly ? This input works fine on the serial version of vasp

Problems with MPI vasp at runtime

Posted: Wed Jun 14, 2006 11:03 am
by job
Have you compiled vasp with -i8 and the mpi library with default settings? That won't work.

Problems with MPI vasp at runtime

Posted: Thu Jun 29, 2006 5:22 am
by c00jsh00

We have the similar problem, have you solved the problem yet?

Problems with MPI vasp at runtime

Posted: Mon Jul 24, 2006 11:20 am
by admin
please check if the LAM was compiled in the same bit-mode as you used for the compilation of vasp

Problems with MPI vasp at runtime

Posted: Tue Sep 26, 2006 7:02 pm
by brockp
[quote="job"]Have you compiled vasp with -i8 and the mpi library with default settings? That won't work.[/quote]$ mpirun -np 2 -v ./vasp
running on 2 nodes
[] *** An error occurred in MPI_Cartdim_get
[] *** on communicator MPI_COMM_WORLD
[] *** MPI_ERR_COMM: invalid communicator
[] *** MPI_ERRORS_ARE_FATAL (goodbye)
distr: one band on 1 nodes, 1 groups
1 process killed (possibly by Open MPI)

So i still have not made any progress. I also added the -Ddebug to the flags, but vasp did not display anything.

Also what does -Dkind8 mean?

Problems with MPI vasp at runtime

Posted: Tue Oct 03, 2006 1:07 pm
by brockp
The problem was solved using the following:

Open MPI would not work with vasp this is unfortonate, both mpich and lam are nolonger dev. Moving to more uptodate MPI libs like OpenMPI would be a plus in the future. Im not sure if its OpenMPI or VASP causing the problem so i will pass it on to the OMPI devs see if we can fix it.

PGI 6.1 -i4
Matchin the size of LOGICALS and such was a real pain, Its not documented anyware but the default PGI make file for linux has -i8 in the Makefiles. This caused quite a headache. This MUST match what your MPI lib was built with.

VASP is running now int MPI GoTO was very slow on the example case i had (dont know why) ACML3.5 was slightly faster than ATLAS.

This was on OPT 244 with GIG-E non blocking + Jumbo frames networking. Hope this helps anyone else. If you want I can provide Makefiles for anyone having trouble.

Center for Advanced Computing
University of Michigan (Ann Arbor)

Problems with MPI vasp at runtime

Posted: Tue Oct 17, 2006 6:06 am
by atogo
I met similar problem. In my case, I could solve it by following:

Set the compiler path directory like as,
'FC=mpif90' with $PATH doesn't work. A shared library is missing. I don't know why.

Another choice is to link staticaly ( libmpi.a, liborte.a and libopal.a in openmpi case). Remenber to copy header files from 'include' directory in openmpi.