[SOLVED] VASP 5 crashes when using several computing nodes (large memory)
Posted: Tue Oct 09, 2012 1:34 pm
Hello Everyone,
I have compiled VASP 5.3.2 without errors and it runs properly when I am only using one computing node in our cluster (the computer node has 2 hexacore processors). The problem arises when I try to use more than one computing node: it crashes after at the very beginning. In particular, at the "Iteration 1(1)", just after finishing EDDIAG (when the same simulation is running in just one computing node, next step is RMM-DIIS).
The solution of this post http://cms.mpi.univie.ac.at/vasp-forum/ ... hp?3.11392 didn't solve the problem.
I have posted in dropbox several files where you can find all the information I could gather about this issue:
- File 'Makefile' ( https://dl.dropbox.com/u/27436218/Makefile): log used for making VASP. In brief: I used Intel MPI 4.0.3, Intel Compilers XE2013, Intel MKL BLACS and FFTW3 (from Intel Toolkit XE2013).
- File 'simul.log' ( https://dl.dropbox.com/u/27436218/simul.log ): messages that appear at the screen while the simulation that crashes is running. I used several options within mpirun to get the information related to MPI calls since it seems that the problem is there ( -v -check_mpi -genv I_MPI_DEBUG 5). The interesting information is at the end of the file.
- File 'INCAR' ( https://dl.dropbox.com/u/27436218/INCAR ) : the input file of the simulation I am trying to run, just in case it is meaningful. In brief, it is an ionic relaxation. This input file works fine if I only use one computing node, so I don't think that the problem is here.
- File 'OUTCAR' ( https://dl.dropbox.com/u/27436218/OUTCAR ): the output file of the simulation.
It seems from simul.log that the errors are related to MPI since there are messages such as:
I will appreciate if anyone could give me a hint about what can I check/modify in order to solve this problem.
Thank you very much in advance for your answers and your time.
Kind regards,
Ivan
<span class='smallblacktext'>[ Edited ]</span>
I have compiled VASP 5.3.2 without errors and it runs properly when I am only using one computing node in our cluster (the computer node has 2 hexacore processors). The problem arises when I try to use more than one computing node: it crashes after at the very beginning. In particular, at the "Iteration 1(1)", just after finishing EDDIAG (when the same simulation is running in just one computing node, next step is RMM-DIIS).
The solution of this post http://cms.mpi.univie.ac.at/vasp-forum/ ... hp?3.11392 didn't solve the problem.
I have posted in dropbox several files where you can find all the information I could gather about this issue:
- File 'Makefile' ( https://dl.dropbox.com/u/27436218/Makefile): log used for making VASP. In brief: I used Intel MPI 4.0.3, Intel Compilers XE2013, Intel MKL BLACS and FFTW3 (from Intel Toolkit XE2013).
- File 'simul.log' ( https://dl.dropbox.com/u/27436218/simul.log ): messages that appear at the screen while the simulation that crashes is running. I used several options within mpirun to get the information related to MPI calls since it seems that the problem is there ( -v -check_mpi -genv I_MPI_DEBUG 5). The interesting information is at the end of the file.
- File 'INCAR' ( https://dl.dropbox.com/u/27436218/INCAR ) : the input file of the simulation I am trying to run, just in case it is meaningful. In brief, it is an ionic relaxation. This input file works fine if I only use one computing node, so I don't think that the problem is here.
- File 'OUTCAR' ( https://dl.dropbox.com/u/27436218/OUTCAR ): the output file of the simulation.
It seems from simul.log that the errors are related to MPI since there are messages such as:
Code: Select all
[23]?ERROR:?LOCAL:EXIT:SIGNAL:?fatal?error
[23]?ERROR:????Fatal?signal?11?(SIGSEGV)?raised.
[23]?ERROR:????Signal?was?encountered?at:
[23]?ERROR:???????hamil_mp_hamiltmu_?(/home/ivasan/progrmas/VASP/vasp.5.3_test/vasp)
[23]?ERROR:????After?leaving:
[23]?ERROR:???????mpi_allreduce_(*sendbuf=0x7fff5d1ce340,?*recvbuf=0x18e19c0,?count=1,?datatype=MPI_DOUBLE_PRECISION,?op=MPI_SUM,?comm=0xffffffffc4060000?CART_SUB?CART_CREATE?CART_SUB?CART_CREATE?COMM_WORLD??[18:23],?*ierr=0x7fff5d1ce2ac->MPI_SUCCESS)
Thank you very much in advance for your answers and your time.
Kind regards,
Ivan
<span class='smallblacktext'>[ Edited ]</span>