Parallel Vasp successfully compiled (Intel x86_64, 16 core, OpenMPI, GotoBLAS, Intel-Fortran-Comp.)

Message

Meister Krause · #1 Post by **Meister Krause** » Mon Jul 14, 2008 6:57 pm

We succeeded in compiling a parallel version of Vasp and want to share the experience.

system:
- 4 x Intel Xeon Quadcore, so 16 cores in sum
- OpenSuse Linux 10.3 (X86-64), 64 bit
- Intel Fortran-Compiler 10.1
- Intel C/C++-Compiler 10.1
- OpenMPI 1.2.6
- GotoBLAS 1.26
- Vasp 4.6.34

steps:
- install OpenMPI
- compile the Blas libraries
- build the vasp libraries
- build vasp
- run Hg benchmark

Before we start we set the environment variables for the different compilers so we don't have to specify them each and every time at the comand line, for example in .bashrc for the bash shell:

FC=/opt/intel/fce/10.1.012/bin/ifort ; export FC
CXX=/opt/intel/cce/10.1.015/bin/icpc ; export CXX
CC=/opt/intel/cce/10.1.015/bin/icc ; export CC
F77=/opt/intel/fce/10.1.012/bin/ifort ; export F77

#OpenMPI#
Installing OpenMPI is easy, it comes with a configure script (!), we just need to specify the prefix for the installation folder, build and install it:
./configure --prefix=/openmpi-installation-folder
make all install

#GotoBlas#
Installing GotoBlas was the most confusing part! If you just use the quickbuild.64bit script this script will check if you have a multi CPU environment (SMP) and build threaded Blas libraries, meaning these libraries already make use of your multiple processors. But if you do so, you end up with a parallel version of vasp which is
"mindblowing slow", at least for me! Furthermore, this script searches for installed Fortran-compilers with a special order and if you have multiple Fortran-compilers installed it might choose a different one than your Intel-Fortran-Compiler (which we don't want since we want to use the same compiler for Blas and Vasp).
To circumvent any unwanted things you could modify the makefiles by hand or, which is the way I did it, you can modify the "detect" file which does all the detection for compilers and SMP. So we prevent the detect script from looking for other compilers and we prevent it from using SMP. What follows is the "detect" file I used (I cut off the last part where I made no changes!):
#########################################
rm -f getarch_cmd
rm -f getarch_cmd.exe

make clean

FCOMPILER=NULL

##which g77 > /dev/null 2> /dev/null
##if [ 0 == $? ]; then
##FCOMPILER=G77
##fi

##which g95 > /dev/null 2> /dev/null
##if [ 0 == $? ]; then
##FCOMPILER=G95
##fi

##which gfortran > /dev/null 2> /dev/null
##if [ 0 == $? ]; then
##FCOMPILER=GFORTRAN
##fi

which ifort > /dev/null 2> /dev/null ##comment out everything but ifort
if [ 0 == $? ]; then
FCOMPILER=INTEL
fi

##which pgf77 > /dev/null 2> /dev/null
##if [ 0 == $? ]; then
##FCOMPILER=PGI
##fi

##which pathf90 > /dev/null 2> /dev/null
##if [ 0 == $? ]; then
##FCOMPILER=PATHSCALE
##fi

##which xlf > /dev/null 2> /dev/null
##if [ 0 == $? ]; then
##FCOMPILER=IBM
##fi

HAS_SMP=0

##NUM_CPU=`cat /proc/cpuinfo | grep -c processor`
##if [ $NUM_CPU -gt 1 ]; then
##HAS_SMP=1 ##prevent the check for SMP
##fi

#############################################
I know that this is an absurd way of doing this, one could easily just edit the Makefile.rule by hand.
Anyway, it worked for me.
Than enter
./quickbuild.64bit
and you should end up with a nice Blas library, in my case "libgoto_core2-r1.26.so".

#Vasp.4.lib#
The vasp libraries are easy again, just
cp makefile.linux_ifc_P4 Makefile
possibly edit the FC to match the Intel-Fortran-Compiler and
make

#Vasp.4.6#
After 10 days of trying this also felt easy in the end.
cp makefile.linux_ifc_P4 Makefile
- possibly edit the FC line
- change the OFLAG line from -O3 to -O1 (OFLAG=-O1 -xW -tpp7 in my case)
this is just something I read in the forum, if I don't do that I end up with memory allocation errors when running vasp!
- edit the path to the Blas libraries
- possibly edit CPP precompiler flags (the lower ones after the mpi section)
- edit the path to your mpi fortran-wrapper compiler (opt/openmpi-1.2.6/bin/mpif90 for my case)
- comment in the mpi libraries
- comment in the fft (FFT3D) libraries in the mpi part
What follows is the vasp makefile (I cut off the last part where I made no changes!):
##########################################
.SUFFIXES: .inc .f .f90 .F
#-----------------------------------------------------------------------
# Makefile for Intel Fortran compiler for P4 systems
#
# The makefile was tested only under Linux on Intel platforms
# (Suse 5.3- Suse 9.0)
# the followin compiler versions have been tested
# 5.0, 6.0, 7.0 and 7.1 (some 8.0 versions seem to fail compiling the code)
# presently we recommend version 7.1 or 7.0, since these
# releases have been used to compile the present code versions
#
# it might be required to change some of library pathes, since
# LINUX installation vary a lot
# Hence check ***ALL**** options in this makefile very carefully
#-----------------------------------------------------------------------
#
# BLAS must be installed on the machine
# there are several options:
# 1) very slow but works:
# retrieve the lapackage from ftp.netlib.org
# and compile the blas routines (BLAS/SRC directory)
# please use g77 or f77 for the compilation. When I tried to
# use pgf77 or pgf90 for BLAS, VASP hang up when calling
# ZHEEV (however this was with lapack 1.1 now I use lapack 2.0)
# 2) most desirable: get an optimized BLAS
#
# the two most reliable packages around are presently:
# 3a) Intels own optimised BLAS (PIII, P4, Itanium)
# http://developer.intel.com/software/products/mkl/
# this is really excellent when you use Intel CPU's
#
# 3b) or obtain the atlas based BLAS routines
# http://math-atlas.sourceforge.net/
# you certainly need atlas on the Athlon, since the mkl
# routines are not optimal on the Athlon.
# If you want to use atlas based BLAS, check the lines around LIB=
#
# 3c) mindblowing fast SSE2 (4 GFlops on P4, 2.53 GHz)
# Kazushige Goto's BLAS
# http://www.cs.utexas.edu/users/kgoto/signup_first.html
#
#-----------------------------------------------------------------------

# all CPP processed fortran files have the extension .f90
SUFFIX=.f90

#-----------------------------------------------------------------------
# fortran compiler and linker
#-----------------------------------------------------------------------
FC=/opt/intel/fce/10.1.012/bin/ifort
# fortran linker
FCL=$(FC)

#-----------------------------------------------------------------------
# whereis CPP ?? (I need CPP, can't use gcc with proper options)
# that's the location of gcc for SUSE 5.3
#
# CPP_ = /usr/lib/gcc-lib/i486-linux/2.7.2/cpp -P -C
#
# that's probably the right line for some Red Hat distribution:
#
# CPP_ = /usr/lib/gcc-lib/i386-redhat-linux/2.7.2.3/cpp -P -C
#
# SUSE X.X, maybe some Red Hat distributions:

CPP_ = ./preprocess <$*.F | /usr/bin/cpp -P -C -traditional >$*$(SUFFIX)

#-----------------------------------------------------------------------
# possible options for CPP:
# NGXhalf charge density reduced in X direction
# wNGXhalf gamma point only reduced in X direction
# avoidalloc avoid ALLOCATE if possible
# IFC work around some IFC bugs
# CACHE_SIZE 1000 for PII,PIII, 5000 for Athlon, 8000-12000 P4
# RPROMU_DGEMV use DGEMV instead of DGEMM in RPRO (depends on used BLAS)
# RACCMU_DGEMV use DGEMV instead of DGEMM in RACC (depends on used BLAS)
#-----------------------------------------------------------------------

CPP = $(CPP_) -DHOST=\"LinuxIFC\" \
-Dkind8 -DNGZhalf -DCACHE_SIZE=12000 -Davoidalloc -DMPI -DIFC\
# -DRPROMU_DGEMV -DRACCMU_DGEMV

#-----------------------------------------------------------------------
# general fortran flags (there must a trailing blank on this line)
#-----------------------------------------------------------------------

FFLAGS = -FR -lowercase -assume byterecl

#-----------------------------------------------------------------------
# optimization
# we have tested whether higher optimisation improves performance
# -axK SSE1 optimization, but also generate code executable on all mach.
# xK improves performance somewhat on XP, and a is required in order
# to run the code on older Athlons as well
# -xW SSE2 optimization
# -axW SSE2 optimization, but also generate code executable on all mach.
# -tpp6 P3 optimization
# -tpp7 P4 optimization
#-----------------------------------------------------------------------

OFLAG=-O1 -xW -tpp7

OFLAG_HIGH = $(OFLAG)
OBJ_HIGH =

OBJ_NOOPT =
DEBUG = -FR -O0
INLINE = $(OFLAG)

#-----------------------------------------------------------------------
# the following lines specify the position of BLAS and LAPACK
# on P4, VASP works fastest with the libgoto library
# so that's what I recommend
#-----------------------------------------------------------------------

# Atlas based libraries
#ATLASHOME= $(HOME)/archives/BLAS_OPT/ATLAS/lib/Linux_P4SSE2/
#BLAS= -L$(ATLASHOME) -lf77blas -latlas

# use specific libraries (default library path might point to other libraries)
#BLAS= $(ATLASHOME)/libf77blas.a $(ATLASHOME)/libatlas.a

# use the mkl Intel libraries for p4 (www.intel.com)
# mkl.5.1
# set -DRPROMU_DGEMV -DRACCMU_DGEMV in the CPP lines
#BLAS=-L/opt/intel/mkl/lib/32 -lmkl_p4 -lpthread

# mkl.5.2 requires also to -lguide library
# set -DRPROMU_DGEMV -DRACCMU_DGEMV in the CPP lines
#BLAS=-L/opt/intel/mkl/lib/32 -lmkl_p4 -lguide -lpthread

# even faster Kazushige Goto's BLAS
# http://www.cs.utexas.edu/users/kgoto/signup_first.html
BLAS=-L/opt/GotoBLAS_not_threaded -lgoto

# LAPACK, simplest use vasp.4.lib/lapack_double
LAPACK= ../vasp.4.lib/lapack_double.o

# use atlas optimized part of lapack
#LAPACK= ../vasp.4.lib/lapack_atlas.o -llapack -lcblas

# use the mkl Intel lapack
#LAPACK= -lmkl_lapack

#-----------------------------------------------------------------------

LIB = -L../vasp.4.lib -ldmy \
../vasp.4.lib/linpack_double.o $(LAPACK) \
$(BLAS)

# options for linking (for compiler version 6.X, 7.1) nothing is required
LINK =
# compiler version 7.0 generates some vector statments which are located
# in the svml library, add the LIBPATH and the library (just in case)
#LINK = -L/opt/intel/compiler70/ia32/lib/ -lsvml

#-----------------------------------------------------------------------
# fft libraries:
# VASP.4.6 can use fftw.3.0.X (http://www.fftw.org)
# since this version is faster on P4 machines, we recommend to use it
#-----------------------------------------------------------------------

#FFT3D = fft3dfurth.o fft3dlib.o
FFT3D = fftw3d.o fft3dlib.o /opt/libs/fftw-3.0.1/lib/libfftw3.a

#=======================================================================
# MPI section, uncomment the following lines
#
# one comment for users of mpich or lam:
# You must *not* compile mpi with g77/f77, because f77/g77
# appends *two* underscores to symbols that contain already an
# underscore (i.e. MPI_SEND becomes mpi_send__). The pgf90/ifc
# compilers however append only one underscore.
# Precompiled mpi version will also not work !!!
#
# We found that mpich.1.2.1 and lam-6.5.X to lam-7.0.4 are stable
# mpich.1.2.1 was configured with
# ./configure -prefix=/usr/local/mpich_nodvdbg -fc="pgf77 -Mx,119,0x200000" \
# -f90="pgf90 " \
# --without-romio --without-mpe -opt=-O \
#
# lam was configured with the line
# ./configure -prefix /opt/libs/lam-7.0.4 --with-cflags=-O -with-fc=ifc \
# --with-f77flags=-O --without-romio
#
# please note that you might be able to use a lam or mpich version
# compiled with f77/g77, but then you need to add the following
# options: -Msecond_underscore (compilation) and -g77libs (linking)
#
# !!! Please do not send me any queries on how to install MPI, I will
# certainly not answer them !!!!
#=======================================================================
#-----------------------------------------------------------------------
# fortran linker for mpi: if you use LAM and compiled it with the options
# suggested above, you can use the following line
#-----------------------------------------------------------------------

FC=/opt/openmpi-1.2.6/bin/mpif90
FCL=$(FC)

#-----------------------------------------------------------------------
# additional options for CPP in parallel version (see also above):
# NGZhalf charge density reduced in Z direction
# wNGZhalf gamma point only reduced in Z direction
# scaLAPACK use scaLAPACK (usually slower on 100 Mbit Net)
#-----------------------------------------------------------------------

CPP = $(CPP_) -DMPI -DHOST=\"LinuxIFC\" -DIFC \
-Dkind8 -DNGZhalf -DCACHE_SIZE=4000 -Davoidalloc \
-DMPI_BLOCK=500 \
# -DRPROMU_DGEMV -DRACCMU_DGEMV

#-----------------------------------------------------------------------
# location of SCALAPACK
# if you do not use SCALAPACK simply uncomment the line SCA
#-----------------------------------------------------------------------

BLACS=$(HOME)/archives/SCALAPACK/BLACS/
SCA_=$(HOME)/archives/SCALAPACK/SCALAPACK

#SCA= $(SCA_)/libscalapack.a \
# $(BLACS)/LIB/blacsF77init_MPI-LINUX-0.a $(BLACS)/LIB/blacs_MPI-LINUX-0.a $(BLACS)/LIB/blacsF77init_MPI-LINUX-0.a

SCA=

#-----------------------------------------------------------------------
# libraries for mpi
#-----------------------------------------------------------------------

LIB = -L../vasp.4.lib -ldmy \
../vasp.4.lib/linpack_double.o $(LAPACK) \
$(SCA) $(BLAS)

# FFT: fftmpi.o with fft3dlib of Juergen Furthmueller
#FFT3D = fftmpi.o fftmpi_map.o fft3dlib.o

# fftw.3.0.1 is slighly faster and should be used if available
FFT3D = fftmpiw.o fftmpi_map.o fft3dlib.o /opt/libs/fftw-3.0.1/lib/libfftw3.a

#-----------------------------------------------------------------------
# general rules and compile lines
#-----------------------------------------------------------------------
##################################################################

#Hg benchmark#

finally I ended up with a working parallel version of vasp.

If I run the Hg benchmark using all 16 cores with
mpirun -np 16 vasp-install-dir/vasp
it takes 24 seconds compared to the single core version with 140 seconds!
Love it!

<span class='smallblacktext'>[ Edited ]</span>

#2 Post by **admin** » Tue Jul 15, 2008 1:33 pm

Dear Meister Krause, thank you very much for sharing this useful information with the vasp-forum community! You will save many of the vasp users a lot of troubles

Meister Krause · #3 Post by **Meister Krause** » Tue Aug 19, 2008 6:56 am

Back again with some news:

The way we compiled GotoBLAS was wrong (besides the stupid way we did it).
If you compile the blas libraries with the threading turned on but the number of threads set to 1 you gain between 33 and 50% speed when doing larger calculations.
The User Configuration part of Makefile.rule:

#
# Beginning of user configuration
#

# This library's version
REVISION = -r1.26

# Which C compiler do you prefer? Default is gcc.
C_COMPILER = GNU
# C_COMPILER = INTEL
# C_COMPILER = PGI

# Now you don't need Fortran compiler to build library.
# If you don't spcifly Fortran Compiler, GNU g77 compatible
# interface will be used.
# F_COMPILER = G77
# F_COMPILER = G95
# F_COMPILER = GFORTRAN
F_COMPILER = INTEL
# F_COMPILER = PGI
# F_COMPILER = PATHSCALE
# F_COMPILER = IBM
# F_COMPILER = COMPAQ
# F_COMPILER = SUN
# F_COMPILER = F2C

# If you need 64bit binary; some architecture can accept both 32bit and
# 64bit binary(X86_64, SPARC, Power/PowerPC or WINDOWS).
BINARY64 = 1

# If you want to build threaded BLAS
SMP = 1

# You can define maximum number of threads. Basically it should be
# less than actual number of cores. If you don't specify one, it's
# automatically detected by script.
MAX_THREADS = 1

# If you want to use legacy threaded Level 3 implementation.
# Some architecture prefer this algorithm, but it's rare.
# USE_SIMPLE_THREADED_LEVEL3 = 1

# If you want to use GotoBLAS with accerelator like Cell or GPGPU
# This is experimental and currently won't work well.
# USE_ACCERELATOR = 1

# Define accerelator type (won't work)
# USE_CELL_SPU = 1

# Theads are still working for a while after finishing BLAS operation
# to reduce thread activate/deactivate overhead. You can determine
# time out to improve performance. This number should be from 4 to 30
# which corresponds to (1 << n) cycles. For example, if you set to 26,
# thread will be running for (1 << 26) cycles(about 25ms on 3.0GHz
# system). Also you can control this mumber by GOTO_THREAD_TIMEOUT
# CCOMMON_OPT += -DTHREAD_TIMEOUT=26

# If you need cross compiling
# (you have to set architecture manually in getarch.c!)
# Example : HOST ... G5 OSX, TARGET = CORE2 OSX
# CROSS_SUFFIX = i686-apple-darwin8-
# CROSS_VERSION = -4.0.1
# CROSS_BINUTILS =

# If you need Special memory management;
# Using HugeTLB file system(Linux / AIX / Solaris)
# HUGETLB_ALLOCATION = 1

# Using bigphysarea memory instead of normal allocation to get
# physically contiguous memory.
# BIGPHYSAREA_ALLOCATION = 1

# To get maxiumum performance with minimum impact to the system,
# mixing memory allocation may be worth to try. In this case,
# you have to define one of ALLOC_HUGETLB or BIGPHYSAREA_ALLOCATION.
# Another allocation will be done by mmap or static allocation.
# (Not implemented yet)
# MIXED_MEMORY_ALLOCATION = 1

# Using static allocation instead of dynamic allocation
# You can't use it with ALLOC_HUGETLB
# STATIC_ALLOCATION = 1

# If you want to use CPU affinity
# CCOMMON_OPT += -DUSE_CPU_AFFINITY

# If you want to use memory affinity (NUMA)
# You can't use it with ALLOC_STATIC
# NUMA_AFFINITY = 1

# If you want to use interleaved memory allocation.
# Default is local allocation(it only works with NUMA_AFFINITY).
# CCOMMON_OPT += -DINTERLEAVED_MAPPING

# If you want to drive whole 64bit region by BLAS. Not all Fortran
# compiler supports this. It's safe to keep comment it out if you
# are not sure.
# INTERFACE64 = 1

# If you have special compiler to run script to determine architecture.
GETARCH_CC +=
GETARCH_FLAGS +=

ginggs · #4 Post by **ginggs** » Tue Aug 19, 2008 1:29 pm

Thanks again Meister Krause!

sagarambavale · #5 Post by **sagarambavale** » Fri Sep 05, 2008 5:38 am

Hi all and dear Meister,
I got so much help from your post. My makefile is almost same as posted by Meister, but still i am finding difficulty to compile the vasp-4.6. I am getting error like :
....
.....
........
/opt/intel/mkl/10.0.1.014/lib/em64t//libmkl_intel_thread.a(ztrsm_omp.o): In function `L_mkl_blas_ztrsm_259__par_loop0':
__tmp_ztrsm_omp.c:(.text+0x72b): undefined reference to `__kmpc_for_static_init_8'
__tmp_ztrsm_omp.c:(.text+0x892): undefined reference to `__kmpc_for_static_fini'
/opt/intel/mkl/10.0.1.014/lib/em64t//libmkl_intel_thread.a(ztrsm_omp.o): In function `L_mkl_blas_ztrsm_276__par_loop1':
__tmp_ztrsm_omp.c:(.text+0xa31): undefined reference to `__kmpc_for_static_init_8'
__tmp_ztrsm_omp.c:(.text+0xb92): undefined reference to `__kmpc_for_static_fini'
/opt/intel/mkl/10.0.1.014/lib/em64t//libmkl_intel_thread.a(mkl_threading.o): In function `MKL_Get_Max_Threads':
__tmp_mkl_threading.c:(.text+0x55): undefined reference to `omp_in_parallel'
__tmp_mkl_threading.c:(.text+0x77): undefined reference to `omp_get_max_threads'
/opt/intel/mkl/10.0.1.014/lib/em64t//libmkl_intel_thread.a(mkl_threading.o): In function `MKL_Domain_Get_Max_Threads':
__tmp_mkl_threading.c:(.text+0x857): undefined reference to `omp_in_parallel'
__tmp_mkl_threading.c:(.text+0x878): undefined reference to `omp_get_max_threads'
/opt/MPI/lam-7.1.4/lam/bin/mpif90: No such file or directory
make: *** [vasp] Error 1
******************************************
Thus it can not find mpif90 though i have working lam-mpi with PWSCF. so where might be the problem?
********************************
system:
- dual cpu - Intel Xeon Quadcore, so 8 cores in sum in
- Red-Hat linux 5 (X86-64), 64 bit
- Intel Fortran-Compiler 10.1
- Intel C/C++-Compiler 10.1
- LAM-MPI 7.1.4
- Intel MKL 10.0.1.014
- Vasp 4.6

steps:
-i have already Working lam-mpi with PWSCF
-Working MKL
-vasp library builded successfully
****************************************************
Makefile is:
.SUFFIXES: .inc .f .f90 .F
# all CPP processed fortran files have the extension .f90
SUFFIX=.f90
#-----------------------------------------------------------------------
# fortran compiler and linker
#-----------------------------------------------------------------------
FC=/opt/intel/fce/10.1.012/bin/ifort
# fortran linker
FCL=$(FC)
CPP_ = ./preprocess <$*.F | /usr/bin/cpp -P -C -traditional >$*$(SUFFIX)
CPP = $(CPP_) -DHOST=\"LinuxIFC\" \
-Dkind8 -DNGXhalf -DCACHE_SIZE=12000 -Davoidalloc -DMPI -DIFC \
-DRPROMU_DGEMV -DRACCMU_DGEMV
#-----------------------------------------------------------------------
# general fortran flags (there must a trailing blank on this line)
#-----------------------------------------------------------------------

FFLAGS = -FR -lowercase -assume byterecl

#-----------------------------------------------------------------------

OFLAG=-O1 -xW -tpp7

OFLAG_HIGH = $(OFLAG)
OBJ_HIGH =

OBJ_NOOPT =
DEBUG = -FR -O0
INLINE = $(OFLAG)

#-----------------------------------------------------------------------
BLAS=-L/opt/intel/mkl/10.0.1.014/lib/em64t/ -lmkl_em64t
# LAPACK, simplest use vasp.4.lib/lapack_double
LAPACK= ../vasp.4.lib/lapack_double.o
#-----------------------------------------------------------------------

LIB = -L../vasp.4.lib -ldmy \
../vasp.4.lib/linpack_double.o $(LAPACK) \
$(BLAS)

# options for linking (for compiler version 6.X, 7.1) nothing is required
LINK =-L/opt/intel/fce/10.1.012/lib -lsvml
#-----------------------------------------------------------------------

#FFT3D = fft3dfurth.o fft3dlib.o
FFT3D = fftmpiw.o fftmpi_map.o fft3dlib.o /opt/intel/mkl/10.0.1.014/lib/em64t/libfftw3xf_intel.a

#=======================================================================
#=======================================================================
# MPI section, uncomment the following lines
#
#=======================================================================
#-----------------------------------------------------------------------
# fortran linker for mpi: if you use LAM and compiled it with the options
# suggested above, you can use the following line
#-----------------------------------------------------------------------

FC=/opt/MPI/lam-7.1.4/lam/bin/mpif90
FCL=$(FC)
#-----------------------------------------------------------------------

CPP = $(CPP_) -DMPI -DHOST=\"LinuxIFC\" -DIFC \
-Dkind8 -DNGZhalf -DCACHE_SIZE=4000 -DPGF90 -Davoidalloc \
-DMPI_BLOCK=500 \
# -DRPROMU_DGEMV -DRACCMU_DGEMV -DscaLAPACK
#-----------------------------------------------------------------------
# location of SCALAPACK
# if you do not use SCALAPACK simply uncomment the line SCA
#-----------------------------------------------------------------------

BLACS=$(HOME)/archives/SCALAPACK/BLACS/
SCA_=$(HOME)/archives/SCALAPACK/SCALAPACK

#SCA= $(SCA_)/libscalapack.a \
#$(BLACS)/LIB/blacsF77init_MPI-LINUX-0.a $(BLACS)/LIB/blacs_MPI-LINUX-0.a $(BLACS)/LIB/blacsF77init_MPI-LINUX-0.a

SCA=
#-----------------------------------------------------------------------
# libraries for mpi
#-----------------------------------------------------------------------

LIB = -L../vasp.4.lib -ldmy \
../vasp.4.lib/linpack_double.o $(LAPACK) \
$(SCA) $(BLAS)

# FFT: fftmpi.o with fft3dlib of Juergen Furthmueller
FFT3D = fftmpi.o fftmpi_map.o fft3dlib.o /opt/intel/mkl/10.0.1.014/lib/em64t/libfftw3xf_intel.a

# fftw.3.0.1 is slighly faster and should be used if available
#FFT3D = fftmpiw.o fftmpi_map.o fft3dlib.o /opt/intel/mkl/10.0.1.014/lib/em64t/libfftw3xf_intel.a
#-----------------------------------------------------------------------
# general rules and compile lines
#-----------------------------------------------------------------------
This section is unchanged and so i am not posting it.

***********************************************************************************************

Meister Krause · #6 Post by **Meister Krause** » Fri Sep 05, 2008 1:51 pm

Hi sagarambavale,

well, the problem with "/opt/MPI/lam-7.1.4/lam/bin/mpif90: No such file or directory " is obvious, but I think the undefined references above in mkl are a problem as well. Is your library build correctly? (I have no clue about mkl!)

So make does not find your fortran wrapper compiler, are you sure "/opt/MPI/lam-7.1.4/lam/bin/mpif90" does really exist?
Did you copy the "mpif.h" file to your vasp directory (section 3.5.15 MPI in the vasp manual)? My mpif.h did not need any f77/f90 conversion!
Path problems should be easy to solve.

ciau

yige · #7 Post by **yige** » Fri Oct 03, 2008 1:07 am

Dear admin and VASP users:

I have a AMD Opteron system (two AMD Barcelona 2.3GHz quad core processors per node) with PGI-7.2 fortran compiler and openmpi-1.2.7. I follows the instructions offered by Meister Krause (Thanks, Meister Krause, for your great work) to compile the GoToBlas, vasp.4.lib and then VASP. I succeeded in compiling. However, when I run VASP on 8 cpus (I only tried), I found the parallel vasp is not stable. Sometimes it works well, sometimes it just give me "segmentation error" or "Error EDDDAV: Call to ZHEGV failed". Anyone has any idea of this problem?

YIGE

#8 Post by **admin** » Fri Oct 03, 2008 11:55 am

1) please make sure that you have turned of hyperthreading.
2) If the error persists for a job that runs safely (with exactly the same input) on a single processor (please set NBANDS explicitely to the number used in the parallel run), please compare the outputs of the parallel and serial runs to check the differences.

yige · #9 Post by **yige** » Thu Oct 09, 2008 1:35 am

Dear Admin:

(1) How can I turn off the hyperthreading? I am not the admin of the cluster I were using.
(2) Following your instruction, I compared the results of my parallel calculations and serial calculations. The results are the same if parallel calculations succeeded.

Could you please give me more ideas? Thanks a lot.

Sincerely yours

Zhe Liu

shahriar · #10 Post by **shahriar** » Thu Oct 09, 2008 7:32 am

We successfully compiled parallel VASP on a machine similar to Meister Krause. Thanks to all for the suggestions and queries.

I attempted to run the benchmark and get a weird error message (below). Can you please help me decipher it and instruct how to rectify this patent related issue? Thanks!

| Recently Corning got a patent for the Teter Allan Payne algorithm |
| therefore VASP.4.5 does not support IALGO=8 any longer |
| a much faster algorithm, IALGO=38, is now implemented in VASP |
| this algorithm is a blocked Davidson like method and as reliable as |
| IALGO=8 used to be |
| for ultimate performance IALGO=48 is still the method of choice |
| -- SO MUCH ABOUT PATENTS

|
| |
| ----> I REFUSE TO CONTINUE WITH THIS SICK JOB ..., BYE!!! <----

Meister Krause · #11 Post by **Meister Krause** » Fri Oct 10, 2008 2:22 pm

To yige:

Sorry, I don't know how to help you. I don't even know what the admin means with turning of hyperthreading, sorry.

To shahriar:

Congratulations!
Well, just do what the program tells you: change the the algorithm type in the INCAR file from 8 to 38 or 48, then it should be working.

farzaneh · #12 Post by **farzaneh** » Mon Apr 14, 2014 1:06 pm

Dear Krause,

I set the Makefile using your suggestions, but I got the following errors. Can you help me in this issue?

gfortran: error: byterecl: No such file or directory
gfortran: error: unrecognized command line option ?-assume?
gfortran: error: unrecognized command line option ?-tpp7

Thanks