random hanges in vasp jobs when parallelise on more 32 cores

Questions regarding the compilation of VASP on various platforms: hardware, compilers and libraries, etc.


Moderators: Global Moderator, Moderator

Post Reply
Message
Author
akuritu

random hanges in vasp jobs when parallelise on more 32 cores

#1 Post by akuritu » Thu Aug 08, 2013 8:16 am

Dear vasp admin,

I have installed parallel version of Vasp.5.2.11 with intel composer_xe_2013.1.117. I have compiled it with the intel mpiifort compiler. We have 16 core per node in our cluster and the Linux version is 2.6.32-220.13.1.el6.x86_64. I am successfully able to run my jobs when I use 1 and 2 nodes (means 16 and 32 cores).
But if I want to parallelise my job on 3 node or 4 node (means on 48 core or 64 core) , it do some ionic steps and then hangs. Job status shows that job is running but it do not write any data in output files after it got stuck. These hangs are random and not specific to the system I am studying.
Hence same job run successfully if I submit it cores<=32, but it hanged if I submit it with core >32. We have complained this problem to the company (Wipro) who has set-up our cluster and installed BLAS etc. They claim that there is no problem in BLAS and LAPACK since they have tested it with other softwares with more than 32 cores.

I have checked it with VASP.5.2.12 too and still getting the same problem.
Please help me in this regard since I am not able to run job for bigger systems.
Please share your views if you have some idea about this problem.
Here is the main part of my makefile.....

.SUFFIXES: .inc .f .f90 .F
SUFFIX=.f90
CPP_ = ./preprocess <$*.F | /usr/bin/cpp -P -C -traditional >$*$(SUFFIX)
FFLAGS = -FR -lowercase
OFLAG=-O3
OFLAG_HIGH = $(OFLAG)
OBJ_HIGH =
OBJ_NOOPT =
DEBUG = -FR -O0
INLINE = $(OFLAG)
BLAS= -L/opt/intel/composer_xe_2013.1.117/mkl/lib/intel64 -lmkl_blas95_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread

LAPACK= -L/opt/intel/composer_xe_2013.1.117/mkl/lib/intel64 -lmkl_lapack95_lp64

LINK =

FC=/opt/intel/impi/4.1.0.024/intel64/bin/mpiifort
FCL=$(FC)

CPP = $(CPP_) -DMPI -DHOST=\"LinuxIFC\" -DIFC \
-DCACHE_SIZE=4000 -DPGF90 -Davoidalloc -DNGZhalf \
-DMPI_BLOCK=8000 -DRPROMU_DGEMV -DRACCMU_DGEMV

SCA=

LIB = -L../vasp.5.lib -ldmy \
../vasp.5.lib/linpack_double.o $(LAPACK) \
$(SCA) $(BLAS)

FFT3D = fftmpi.o fftmpi_map.o fft3dfurth.o fft3dlib.o


Other parts are just same as in usual vasp makefile.

Thanks in advance
Last edited by akuritu on Thu Aug 08, 2013 8:16 am, edited 1 time in total.

admin
Administrator
Administrator
Posts: 2921
Joined: Tue Aug 03, 2004 8:18 am
License Nr.: 458

random hanges in vasp jobs when parallelise on more 32 cores

#2 Post by admin » Thu Sep 05, 2013 1:39 pm

this rather looks like a problem with your MPI installation of the connection hardware. It is certainly not related to vasp.
Last edited by admin on Thu Sep 05, 2013 1:39 pm, edited 1 time in total.

akuritu

random hanges in vasp jobs when parallelise on more 32 cores

#3 Post by akuritu » Tue Feb 11, 2014 7:28 am

yes, It was the problem of MPI. I installed openmpi at user level and complied vasp again using mpif90. Now vasp is working fine for node >32 core.

Thanks for the reply.
Last edited by akuritu on Tue Feb 11, 2014 7:28 am, edited 1 time in total.

Post Reply