parallel compile problems on Opteron quadcore cluster

Message

NathanPinney · #1 Post by **NathanPinney** » Mon Jan 12, 2009 5:14 am

Iâ€™m having trouble compiling VASP for parallel operation on our new Infiniband-networked cluster. The system is equipped with 8 cores per node (two Opteron quad cores per node) and runs Rocks OS (RedHat based).

Here are the compile steps Iâ€™m using:

1. Compile ATLAS libaries from source using GNU compilers (gcc, gfortran)
2. Extract VASP source from tarball
3. Compile VASP libaries using GNU compilers (all serial, of course)
4. Compile VASP executable using mvapich2 infiniband-compliant compilers (all built using gnu compilers on our machine)

We have successfully used an identical install process for VASP on our old cluster, but with older compiler versions. The above compile sequence results in parallel jobs that show the following error in the OUTCAR, but only occasionally (roughly 50% of the time):

Code: Select all

-----------------------------------------Â IterationÂ 3(Â 17)---------------------------------
POTLOK:Â VPUÂ timeÂ 0.77:Â CPUÂ timeÂ 0.77
SETDIJ:Â VPUÂ timeÂ 0.08:Â CPUÂ timeÂ 0.08
ErrorÂ EDDDAV:Â CallÂ toÂ ZHEGVÂ failed.Â ReturncodeÂ =Â 64Â 464

The error above is observed on all attempted combinations of processors and nodes (ex:4 procs each on 4 nodes, 8 procs each on 2 nodes etc). Failure does not seem to occur on specific nodes, ex: a job will complete successfully on node 28, then a different job will fail on node 28 at a later time. Also, failure is not immediate - the job above went through almost three complete ionic iterations before failure.

Small systems (2 atoms) complete successfully more often than do large systems (~30 atoms).

When I compile the executable for serial operation, using serial compilers (gfortran) instead of MPI, everything runs smoothly (albeit slowly) on one processor.

Weâ€™ve also tried using the following MPI compilers:
-openmpi, built with gcc compilers (fortran compiler = mpif90)
-mvapich-1.0.0, built with intel compilers (fortran compiler = mpif90)
with similar results (Error EDDAV, as shown above)

Additionally, jobs that complete often show the following error in the standard output file:

Code: Select all

Â 
WARNING:Â Sub-Space-MatrixÂ isÂ notÂ hermitianÂ inÂ DAVÂ 11Â -1017601.45907357

I've seen reference to errors like this here, but in the context of a serial compile. Adding the lines suggested in the link did not resolve the error.

Any advice or suggestions are greatly appreciated!

Many thanks,
Nate at University of Wisconsin-Madison

Makefile is below:

Code: Select all

.SUFFIXES:Â .incÂ .fÂ .f90Â .F
#-----------------------------------------------------------------------
#commentsÂ section,Â removedÂ forÂ brevity
#-----------------------------------------------------------------------

#Â allÂ CPPÂ processedÂ fortranÂ filesÂ haveÂ theÂ extensionÂ .f90
SUFFIX=.f90

#-----------------------------------------------------------------------
#Â fortranÂ compilerÂ andÂ linker
#-----------------------------------------------------------------------
#FC=/opt/intel/fce/9.0/bin/ifort
#Â fortranÂ linker
#FCL=$(FC)


#-----------------------------------------------------------------------
#Â whereisÂ CPPÂ ??Â (IÂ needÂ CPP,Â can'tÂ useÂ gccÂ withÂ properÂ options)
#Â that'sÂ theÂ locationÂ ofÂ gccÂ forÂ SUSEÂ 5.3
#
#Â Â CPP_Â Â Â =Â Â /usr/lib/gcc-lib/i486-linux/2.7.2/cppÂ -PÂ -CÂ 
#
#Â that'sÂ probablyÂ theÂ rightÂ lineÂ forÂ someÂ RedÂ HatÂ distribution:
#
#Â Â CPP_Â Â Â =Â Â /usr/lib/gcc-lib/i386-redhat-linux/2.7.2.3/cppÂ -PÂ -C
#
#Â Â SUSEÂ X.X,Â maybeÂ someÂ RedÂ HatÂ distributions:

CPP_Â =Â Â ./preprocessÂ <$*.FÂ |Â /usr/bin/cppÂ -PÂ -CÂ -traditionalÂ >$*$(SUFFIX)

#-----------------------------------------------------------------------
#Â possibleÂ optionsÂ forÂ CPP:
#Â NGXhalfÂ Â Â Â Â Â Â Â Â Â Â Â Â chargeÂ densityÂ Â Â reducedÂ inÂ XÂ direction
#Â wNGXhalfÂ Â Â Â Â Â Â Â Â Â Â Â gammaÂ pointÂ onlyÂ reducedÂ inÂ XÂ direction
#Â avoidallocÂ Â Â Â Â Â Â Â Â Â avoidÂ ALLOCATEÂ ifÂ possible
#Â IFCÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â workÂ aroundÂ someÂ IFCÂ bugs
#Â CACHE_SIZEÂ Â Â Â Â Â Â Â Â Â 1000Â forÂ PII,PIII,Â 5000Â forÂ Athlon,Â 8000-12000Â P4
#Â RPROMU_DGEMVÂ Â Â Â Â Â Â Â useÂ DGEMVÂ insteadÂ ofÂ DGEMMÂ inÂ RPROÂ (dependsÂ onÂ usedÂ BLAS)
#Â RACCMU_DGEMVÂ Â Â Â Â Â Â Â useÂ DGEMVÂ insteadÂ ofÂ DGEMMÂ inÂ RACCÂ (dependsÂ onÂ usedÂ BLAS)
#Â forÂ AtlasÂ Â -DRPROMU_DGEMVÂ isÂ recommended
#-----------------------------------------------------------------------

CPPÂ Â Â Â Â =Â $(CPP_)Â Â -DHOST=\"LinuxIFC_ath\"Â \
Â Â Â Â Â Â Â Â Â Â -Dkind8Â -DNGXhalfÂ -DCACHE_SIZE=5000Â -DPGF90Â -DavoidallocÂ \
Â Â Â Â Â Â Â Â Â Â -DRPROMU_DGEMVÂ 

#-----------------------------------------------------------------------
#Â generalÂ fortranÂ flagsÂ Â (thereÂ mustÂ aÂ trailingÂ blankÂ onÂ thisÂ line)
#-----------------------------------------------------------------------

#FFLAGSÂ =Â Â Â 

#-----------------------------------------------------------------------
#Â optimization
#Â weÂ haveÂ testedÂ whetherÂ higherÂ optimisationÂ improvesÂ performance
#Â -axKÂ Â SSE1Â optimization,Â Â butÂ alsoÂ generateÂ codeÂ executableÂ onÂ allÂ mach.
#Â Â Â Â Â Â Â xKÂ improvesÂ performanceÂ somewhatÂ onÂ XP,Â andÂ aÂ isÂ requiredÂ inÂ order
#Â Â Â Â Â Â Â toÂ runÂ theÂ codeÂ onÂ olderÂ AthlonsÂ asÂ well
#Â -xWÂ Â Â SSE2Â optimization
#Â -axWÂ Â SSE2Â optimization,Â Â butÂ alsoÂ generateÂ codeÂ executableÂ onÂ allÂ mach.
#Â -tpp6Â P3Â optimization
#Â -tpp7Â P4Â optimization
#-----------------------------------------------------------------------

OFLAG=Â -O1Â 

OFLAG_HIGHÂ =Â $(OFLAG)
OBJ_HIGHÂ =Â 

OBJ_NOOPTÂ =Â 
DEBUGÂ Â =Â Â -O0
INLINEÂ =Â $(OFLAG)


#-----------------------------------------------------------------------
#Â theÂ followingÂ linesÂ specifyÂ theÂ positionÂ ofÂ BLASÂ Â andÂ LAPACK
#Â onÂ Athlon,Â VASPÂ worksÂ fastestÂ withÂ theÂ AtlasÂ library
#Â soÂ that'sÂ whatÂ IÂ recommend
#-----------------------------------------------------------------------

#Â AtlasÂ basedÂ libraries
ATLASHOME=Â /usr/local/src/ATLAS/lib
##/root/downloads/ATLAS/ATLAS/lib/Linux_HAMMER64SSE2
BLAS=Â Â Â -L$(ATLASHOME)Â Â -lf77blasÂ -latlas

#Â useÂ theÂ mklÂ IntelÂ librariesÂ forÂ p4Â (www.intel.com)
#Â mkl.5.1
#Â setÂ -DRPROMU_DGEMVÂ Â -DRACCMU_DGEMVÂ inÂ theÂ CPPÂ lines
#BLAS=-L/opt/intel/mkl/lib/32Â -lmkl_p4Â Â -lpthread

#Â mkl.5.2Â requiresÂ alsoÂ toÂ -lguideÂ library
#Â setÂ -DRPROMU_DGEMVÂ Â -DRACCMU_DGEMVÂ inÂ theÂ CPPÂ lines
#BLAS=-L/opt/intel/mkl/lib/32Â -lmkl_p4Â -lguideÂ -lpthread

#Â evenÂ fasterÂ KazushigeÂ Goto'sÂ BLAS
#Â http://www.cs.utexas.edu/users/kgoto/signup_first.html
#BLAS=Â Â /opt/libs/libgoto/libgoto_p4_512-r0.6.so

#Â LAPACK,Â simplestÂ useÂ vasp.4.lib/lapack_double
#LAPACK=Â ../vasp.4.lib/lapack_double.o

#Â useÂ atlasÂ optimizedÂ partÂ ofÂ lapackÂ 
LAPACK=Â ../vasp.4.lib/lapack_atlas.oÂ -llapackÂ -lcblas

#Â useÂ theÂ mklÂ IntelÂ lapack
#LAPACK=Â -lmkl_lapack

#-----------------------------------------------------------------------

#LIBÂ Â =Â -L../vasp.4.libÂ -ldmyÂ \
#Â Â Â Â Â ../vasp.4.lib/linpack_double.oÂ $(LAPACK)Â \
#Â Â Â Â Â $(BLAS)Â 

#Â optionsÂ forÂ linkingÂ (forÂ compilerÂ versionÂ 6.X,Â 7.1)Â nothingÂ isÂ required
#LINKÂ Â Â Â =Â 
#Â compilerÂ versionÂ 7.0Â generatesÂ someÂ vectorÂ statmentsÂ whichÂ areÂ located
#Â inÂ theÂ svmlÂ library,Â addÂ theÂ LIBPATHÂ andÂ theÂ libraryÂ (justÂ inÂ case)
#LINKÂ Â Â Â =Â Â -L/opt/intel/compiler70/ia32/lib/Â -lsvmlÂ 

#-----------------------------------------------------------------------
#Â fftÂ libraries:
#Â VASP.4.6Â canÂ useÂ fftw.3.0.XÂ (http://www.fftw.org)
#Â sinceÂ thisÂ versionÂ isÂ fasterÂ onÂ P4Â machines,Â weÂ recommendÂ toÂ useÂ it
#-----------------------------------------------------------------------

#FFT3DÂ Â Â =Â fft3dfurth.oÂ fft3dlib.o
#FFT3DÂ Â Â =Â fftw3d.oÂ fft3dlib.oÂ Â Â /opt/libs/fftw-3.0.1/lib/libfftw3.a


#=======================================================================
#Â MPIÂ section,Â uncommentÂ theÂ followingÂ lines
#Â 
#Â oneÂ commentÂ forÂ usersÂ ofÂ mpichÂ orÂ lam:
#Â YouÂ mustÂ *not*Â compileÂ mpiÂ withÂ g77/f77,Â becauseÂ f77/g77Â Â Â Â Â Â Â Â Â Â Â Â Â 
#Â appendsÂ *two*Â underscoresÂ toÂ symbolsÂ thatÂ containÂ alreadyÂ anÂ Â Â Â Â Â Â Â 
#Â underscoreÂ (i.e.Â MPI_SENDÂ becomesÂ mpi_send__).Â Â TheÂ pgf90/ifc
#Â compilersÂ howeverÂ appendÂ onlyÂ oneÂ underscore.
#Â PrecompiledÂ mpiÂ versionÂ willÂ alsoÂ notÂ workÂ !!!
#
#Â WeÂ foundÂ thatÂ mpich.1.2.1Â andÂ lam-6.5.XÂ toÂ lam-7.0.4Â areÂ stable
#Â mpich.1.2.1Â wasÂ configuredÂ withÂ 
#Â Â ./configureÂ -prefix=/usr/local/mpich_nodvdbgÂ -fc="pgf77Â -Mx,119,0x200000"Â Â \
#Â -f90="pgf90Â -Mx,119,0x200000"Â \
#Â --without-romioÂ --without-mpeÂ -opt=-OÂ \
#Â 
#Â lamÂ wasÂ configuredÂ withÂ theÂ line
#Â Â ./configureÂ Â -prefixÂ /opt/libs/lam-7.0.4Â --with-cflags=-OÂ -with-fc=ifcÂ \
#Â --with-f77flags=-OÂ --without-romio
#Â 
#Â pleaseÂ noteÂ thatÂ youÂ mightÂ beÂ ableÂ toÂ useÂ aÂ lamÂ orÂ mpichÂ versionÂ 
#Â compiledÂ withÂ f77/g77,Â butÂ thenÂ youÂ needÂ toÂ addÂ theÂ following
#Â options:Â -Msecond_underscoreÂ (compilation)Â andÂ -g77libsÂ (linking)
#
#Â !!!Â PleaseÂ doÂ notÂ sendÂ meÂ anyÂ queriesÂ onÂ howÂ toÂ installÂ MPI,Â IÂ will
#Â certainlyÂ notÂ answerÂ themÂ !!!!
#=======================================================================
#-----------------------------------------------------------------------
#Â fortranÂ linkerÂ forÂ mpi:Â ifÂ youÂ useÂ LAMÂ andÂ compiledÂ itÂ withÂ theÂ options
#Â suggestedÂ above,Â Â youÂ canÂ useÂ theÂ followingÂ line
#-----------------------------------------------------------------------
FC=/share/apps/mvapich2/ohioState/1.2p1-gnu/bin/mpif90
#FC=/usr/mpi/gcc/openmpi-1.2.5/bin/mpif90
#FC=Â /usr/mpi/pgi/mvapich-1.0.0/bin/mpif90Â Â 
#FC=Â /opt/intel/impi/3.1/bin64/mpiifortÂ -static_mpi
FCL=$(FC)

#-----------------------------------------------------------------------
#Â additionalÂ optionsÂ forÂ CPPÂ inÂ parallelÂ versionÂ (seeÂ alsoÂ above):
#Â NGZhalfÂ Â Â Â Â Â Â Â Â Â Â Â Â Â Â chargeÂ densityÂ Â Â reducedÂ inÂ ZÂ direction
#Â wNGZhalfÂ Â Â Â Â Â Â Â Â Â Â Â Â Â gammaÂ pointÂ onlyÂ reducedÂ inÂ ZÂ direction
#Â scaLAPACKÂ Â Â Â Â Â Â Â Â Â Â Â Â useÂ scaLAPACKÂ (usuallyÂ slowerÂ onÂ 100Â MbitÂ Net)
#Â 1000Â orÂ 2000Â areÂ theÂ optimalÂ CACHE_SIZEÂ forÂ theÂ parallelÂ version
#Â andÂ IFCÂ onÂ AthlonÂ XPÂ (gK)
#-----------------------------------------------------------------------

CPPÂ Â Â Â =Â $(CPP_)Â -DMPIÂ Â -DHOST=\"LinuxIFC_ath\"Â -DIFCÂ \
Â Â Â Â Â -Dkind8Â -DNGZhalfÂ -DCACHE_SIZE=2000Â -DPGF90Â -DavoidallocÂ \
Â Â Â Â Â -DRPROMU_DGEMV

#-----------------------------------------------------------------------
#Â locationÂ ofÂ SCALAPACK
#Â ifÂ youÂ doÂ notÂ useÂ SCALAPACKÂ simplyÂ uncommentÂ theÂ lineÂ SCA
#-----------------------------------------------------------------------

#BLACS=$(HOME)/archives/SCALAPACK/BLACS/
#SCA_=$(HOME)/archives/SCALAPACK/SCALAPACK

#SCA=Â $(SCA_)/libscalapack.aÂ Â \
#Â $(BLACS)/LIB/blacsF77init_MPI-LINUX-0.aÂ $(BLACS)/LIB/blacs_MPI-LINUX-0.aÂ $(BLACS)/LIB/blacsF77init_MPI-LINUX-0.a

SCA=

#-----------------------------------------------------------------------
#Â librariesÂ forÂ mpi
#-----------------------------------------------------------------------

LIBÂ Â Â Â Â =Â -L../vasp.4.libÂ -ldmyÂ Â \
Â Â Â Â Â Â ../vasp.4.lib/linpack_double.oÂ $(LAPACK)Â \
Â Â Â Â Â Â $(SCA)Â $(BLAS)Â 
##-static

#Â FFT:Â fftmpi.oÂ withÂ fft3dlibÂ ofÂ JuergenÂ Furthmueller
FFT3DÂ Â Â =Â fftmpi.oÂ fftmpi_map.oÂ fft3dlib.oÂ 

#Â fftw.3.0.1Â isÂ slighlyÂ fasterÂ andÂ shouldÂ beÂ usedÂ ifÂ available
#FFT3DÂ Â Â =Â fftmpiw.oÂ fftmpi_map.oÂ fft3dlib.oÂ Â Â /opt/libs/fftw-3.0.1/lib/libfftw3.a

#-----------------------------------------------------------------------
#Â generalÂ rulesÂ andÂ compileÂ lines
#-----------------------------------------------------------------------
BASIC=Â Â Â symmetry.oÂ symlib.oÂ Â Â lattlib.oÂ Â random.oÂ Â Â 

SOURCE=Â Â base.oÂ Â Â Â Â mpi.oÂ Â Â Â Â Â smart_allocate.oÂ Â Â Â Â Â xml.oÂ Â \
Â Â Â Â Â Â Â Â Â constant.oÂ jacobi.oÂ Â Â main_mpi.oÂ Â scala.oÂ Â Â \
Â Â Â Â Â Â Â Â Â asa.oÂ Â Â Â Â Â lattice.oÂ Â poscar.oÂ Â Â ini.oÂ Â Â Â Â Â setex.oÂ Â Â Â Â radial.oÂ Â \
Â Â Â Â Â Â Â Â Â pseudo.oÂ Â Â mgrid.oÂ Â Â Â mkpoints.oÂ wave.oÂ Â Â Â Â Â wave_mpi.oÂ Â $(BASIC)Â \
Â Â Â Â Â Â Â Â Â nonl.oÂ Â Â Â Â nonlr.oÂ Â Â Â dfast.oÂ Â Â Â choleski2.oÂ Â Â Â \
Â Â Â Â Â Â Â Â Â mix.oÂ Â Â Â Â Â charge.oÂ Â Â xcgrad.oÂ Â Â xcspin.oÂ Â Â Â potex1.oÂ Â Â potex2.oÂ Â \
Â Â Â Â Â Â Â Â Â metagga.oÂ Â constrmag.oÂ pot.oÂ Â Â Â Â Â cl_shift.oÂ force.oÂ Â Â Â dos.oÂ Â Â Â Â Â elf.oÂ Â Â Â Â Â \
Â Â Â Â Â Â Â Â Â tet.oÂ Â Â Â Â Â hamil.oÂ Â Â Â steep.oÂ Â Â Â \
Â Â Â Â Â Â Â Â Â chain.oÂ Â Â Â dyna.oÂ Â Â Â Â relativistic.oÂ LDApU.oÂ sphpro.oÂ Â paw.oÂ Â Â us.oÂ \
Â Â Â Â Â Â Â Â Â ebs.oÂ Â Â Â Â Â wavpre.oÂ Â Â wavpre_noio.oÂ broyden.oÂ \
Â Â Â Â Â Â Â Â Â dynbr.oÂ Â Â Â rmm-diis.oÂ reader.oÂ Â Â writer.oÂ Â Â tutor.oÂ xml_writer.oÂ \
Â Â Â Â Â Â Â Â Â brent.oÂ Â Â Â stufak.oÂ Â Â fileio.oÂ Â Â opergrid.oÂ stepver.oÂ Â \
Â Â Â Â Â Â Â Â Â dipol.oÂ Â Â Â xclib.oÂ Â Â Â chgloc.oÂ Â Â subrot.oÂ Â Â optreal.oÂ Â Â davidson.oÂ \
Â Â Â Â Â Â Â Â Â edtest.oÂ Â Â electron.oÂ shm.oÂ Â Â Â Â Â pardens.oÂ Â paircorrection.oÂ \
Â Â Â Â Â Â Â Â Â optics.oÂ Â Â constr_cell_relax.oÂ Â Â stm.oÂ Â Â Â finite_diff.oÂ \
Â Â Â Â Â Â Â Â Â elpol.oÂ Â Â Â setlocalpp.oÂ 
Â 
INC=

vasp:Â $(SOURCE)Â $(FFT3D)Â $(INC)Â main.oÂ 
Â Â Â Â Â Â Â Â rmÂ -fÂ vasp
Â Â Â Â Â Â Â Â $(FCL)Â -oÂ vaspÂ $(LINK)Â main.oÂ Â $(SOURCE)Â Â Â $(FFT3D)Â $(LIB)Â 
makeparam:Â $(SOURCE)Â $(FFT3D)Â makeparam.oÂ main.FÂ $(INC)
Â Â Â Â Â Â Â Â $(FCL)Â -oÂ makeparamÂ Â $(LINK)Â makeparam.oÂ $(SOURCE)Â $(FFT3D)Â $(LIB)
zgemmtest:Â zgemmtest.oÂ base.oÂ random.oÂ $(INC)
Â Â Â Â Â Â Â Â $(FCL)Â -oÂ zgemmtestÂ $(LINK)Â zgemmtest.oÂ random.oÂ base.oÂ $(LIB)
dgemmtest:Â dgemmtest.oÂ base.oÂ random.oÂ $(INC)
Â Â Â Â Â Â Â Â $(FCL)Â -oÂ dgemmtestÂ $(LINK)Â dgemmtest.oÂ random.oÂ base.oÂ $(LIB)Â 
ffttest:Â base.oÂ smart_allocate.oÂ mpi.oÂ mgrid.oÂ random.oÂ ffttest.oÂ $(FFT3D)Â $(INC)
Â Â Â Â Â Â Â Â $(FCL)Â -oÂ ffttestÂ $(LINK)Â ffttest.oÂ mpi.oÂ mgrid.oÂ random.oÂ smart_allocate.oÂ base.oÂ $(FFT3D)Â $(LIB)

kpoints:Â $(SOURCE)Â $(FFT3D)Â makekpoints.oÂ main.FÂ $(INC)
Â Â Â Â Â Â Â Â $(FCL)Â -oÂ kpointsÂ $(LINK)Â makekpoints.oÂ $(SOURCE)Â $(FFT3D)Â $(LIB)

clean:
Â Â Â Â Â Â Â Â -rmÂ -fÂ *.gÂ *.fÂ *.oÂ *.LÂ *.modÂ ;Â touchÂ *.F

main.o:Â main$(SUFFIX)
Â Â Â Â Â Â Â Â $(FC)Â $(FFLAGS)$(DEBUG)Â Â $(INCS)Â -cÂ main$(SUFFIX)
xcgrad.o:Â xcgrad$(SUFFIX)
Â Â Â Â Â Â Â Â $(FC)Â $(FFLAGS)Â $(INLINE)Â Â $(INCS)Â -cÂ xcgrad$(SUFFIX)
xcspin.o:Â xcspin$(SUFFIX)
Â Â Â Â Â Â Â Â $(FC)Â $(FFLAGS)Â $(INLINE)Â Â $(INCS)Â -cÂ xcspin$(SUFFIX)

makeparam.o:Â makeparam$(SUFFIX)
Â Â Â Â Â Â Â Â $(FC)Â $(FFLAGS)$(DEBUG)Â Â $(INCS)Â -cÂ makeparam$(SUFFIX)

makeparam$(SUFFIX):Â makeparam.FÂ main.FÂ 
#
#Â MIND:Â IÂ doÂ notÂ haveÂ aÂ fullÂ dependencyÂ listÂ forÂ theÂ include
#Â andÂ MODULES:Â hereÂ areÂ onlyÂ theÂ minimalÂ basicÂ dependencies
#Â ifÂ oneÂ strucutureÂ isÂ changedÂ thenÂ touch_depÂ mustÂ beÂ called
#Â withÂ theÂ correspondingÂ nameÂ ofÂ theÂ structure
#
base.o:Â base.incÂ base.F
mgrid.o:Â mgrid.incÂ mgrid.F
constant.o:Â constant.incÂ constant.F
lattice.o:Â lattice.incÂ lattice.F
setex.o:Â setexm.incÂ setex.F
pseudo.o:Â pseudo.incÂ pseudo.F
poscar.o:Â poscar.incÂ poscar.F
mkpoints.o:Â mkpoints.incÂ mkpoints.F
wave.o:Â wave.incÂ wave.F
nonl.o:Â nonl.incÂ nonl.F
nonlr.o:Â nonlr.incÂ nonlr.F

$(OBJ_HIGH):
Â Â Â Â Â Â Â Â $(CPP)
Â Â Â Â Â Â Â Â $(FC)Â $(FFLAGS)Â $(OFLAG_HIGH)Â $(INCS)Â -cÂ $*$(SUFFIX)
$(OBJ_NOOPT):
Â Â Â Â Â Â Â Â $(CPP)
Â Â Â Â Â Â Â Â $(FC)Â $(FFLAGS)Â $(INCS)Â -cÂ $*$(SUFFIX)

fft3dlib_f77.o:Â fft3dlib_f77.F
Â Â Â Â Â Â Â Â $(CPP)
Â Â Â Â Â Â Â Â $(F77)Â $(FFLAGS_F77)Â -cÂ $*$(SUFFIX)

.F.o:
Â Â Â Â Â Â Â Â $(CPP)
Â Â Â Â Â Â Â Â $(FC)Â $(FFLAGS)Â $(OFLAG)Â $(INCS)Â -cÂ $*$(SUFFIX)
.F$(SUFFIX):
Â Â Â Â Â Â Â Â $(CPP)
$(SUFFIX).o:
Â Â Â Â Â Â Â Â $(FC)Â $(FFLAGS)Â $(OFLAG)Â $(INCS)Â -cÂ $*$(SUFFIX)

#Â specialÂ rules
#-----------------------------------------------------------------------
#Â -tpp5|6|7Â P,Â PII-PIII,Â PIV
#Â -xWÂ useÂ SIMDÂ (doesÂ notÂ payÂ ofÂ onÂ PII,Â sinceÂ fft3dÂ usesÂ doubleÂ prec)
#Â allÂ otherÂ optionsÂ doÂ noÂ affectÂ theÂ codeÂ performanceÂ sinceÂ -O1Â isÂ used
#-----------------------------------------------------------------------

fft3dlib.oÂ :Â fft3dlib.F
Â Â Â Â Â Â Â Â $(CPP)
Â Â Â Â Â Â Â Â $(FC)Â Â -lowercaseÂ -O1Â -unroll0Â -cÂ $*$(SUFFIX)
Â Â Â Â Â Â Â Â $(CPP)
Â Â Â Â Â Â Â Â $(FC)Â Â -lowercaseÂ -O1Â -cÂ $*$(SUFFIX)

lattlib.o:Â lattlib.F
Â Â Â Â Â Â Â Â $(CPP)
Â Â Â Â Â Â Â Â $(FC)Â Â -lowercaseÂ -O1Â -cÂ $*$(SUFFIX)

radial.oÂ :Â radial.F
Â Â Â Â Â Â Â Â $(CPP)
Â Â Â Â Â Â Â Â $(FC)Â Â -lowercaseÂ -O1Â -cÂ $*$(SUFFIX)

symlib.oÂ :Â symlib.F
Â Â Â Â Â Â Â Â $(CPP)
Â Â Â Â Â Â Â Â $(FC)Â Â -lowercaseÂ -O1Â -cÂ $*$(SUFFIX)

symmetry.oÂ :Â symmetry.F
Â Â Â Â Â Â Â Â $(CPP)
Â Â Â Â Â Â Â Â $(FC)Â Â -lowercaseÂ -O1Â -cÂ $*$(SUFFIX)

dynbr.oÂ :Â dynbr.F
Â Â Â Â Â Â Â Â $(CPP)
Â Â Â Â Â Â Â Â $(FC)Â Â -lowercaseÂ -O1Â -cÂ $*$(SUFFIX)

us.oÂ :Â us.F
Â Â Â Â Â Â Â Â $(CPP)
Â Â Â Â Â Â Â Â $(FC)Â Â -lowercaseÂ -O1Â -cÂ $*$(SUFFIX)

broyden.oÂ :Â broyden.F
Â Â Â Â Â Â Â Â $(CPP)
Â Â Â Â Â Â Â Â $(FC)Â Â -lowercaseÂ -O1Â -cÂ $*$(SUFFIX)

wave.oÂ :Â wave.F
Â Â Â Â Â Â Â Â $(CPP)
Â Â Â Â Â Â Â Â $(FC)Â Â -lowercaseÂ -O0Â -cÂ $*$(SUFFIX)

LDApU.oÂ :Â LDApU.F
Â Â Â Â Â Â Â Â $(CPP)
Â Â Â Â Â Â Â Â $(FC)Â Â -lowercaseÂ -O1Â -cÂ $*$(SUFFIX)

#2 Post by **admin** » Fri Jan 23, 2009 12:18 pm

the fact that the eror occurs in the third ionic step rather indicates that
there was sth wrong with yuor calculation. Please specify: did you run
the IDENTICAL job on different machines already?

NathanPinney · #3 Post by **NathanPinney** » Wed Jan 28, 2009 12:29 am

Thanks for the reply.

I cross checked identical jobs on our two clusters. Our old cluster (cluster A) runs vasp.4.6.28 on an Infiniband network. Job on cluster A runs with no errors, and completes normally.

Cluster B (the new system I'm installing) runs vasp.4.6.28 over Infiniband, but an identical job dies after several iterations with the errors described above, including the error to std out about "Sub-Space-Matrix is not hermitian".

"identical" = copied POSCAR, POTCAR, INCAR, KPOINTS from A to B. VASP executables are obviously different when built on different platforms. Run on same number of processors, but in different distributions (B is quadcore, A is dual-core, all Opteron).

Thanks again for suggesting this test. Will update if/when we get a parallel executable working reliably!