Page 1 of 1

version 5.4.4, segmentation fault

Posted: Thu Jun 14, 2018 12:32 pm
by wpiskorz
Hello,

I compiled VASP 5.4.4 with following settings:
* Intel compiler v15.0.2.164 with MKL
* openmpi-2.1.0 (following the advice posted in this forum, could not compile VASP with mpich2)

I wanted to execute the XANES_in_Diamond example (original input files). The calculation started but crashed so I started VASP under control of gdb. Here is the log:
------------------------------------------------------------------------------------------------------------------------

Code: Select all

Starting program: /usr/local/src/vasp/vasp.5.4.4-13.06.2018/vasp.5.4.4/bin/vasp.5.4.4-13.06.2018 
[Thread debugging using libthread_db enabled]
Detaching after fork from child process 27239.
[New Thread 0x2aaab3820700 (LWP 27246)]
[New Thread 0x2aaab46b4700 (LWP 27247)]
 running on    1 total cores
 distrk:  each k-point on    1 cores,    1 groups
 distr:  one band on    1 cores,    1 groups
 using from now: INCAR     
 vasp.5.4.4.18Apr17-6-g9f103f2a35 (build Jun 13 2018 21:27:17) complex          
  
 POSCAR found type information on POSCAR  Co O 
 POSCAR found :  2 types and      56 ions
 scaLAPACK will be used
 LDA part: xc-table for Pade appr. of Perdew
 POSCAR, INCAR and KPOINTS ok, starting setup
 FFT: planning ...
 WAVECAR not read
 WARNING: random wavefunctions but no delay for mixing, default for NELMDL
 entering main loop
       N       E                     dE             d eps       ncg     rms          rms(c)

Program received signal SIGSEGV, Segmentation fault.
0x00002aaab0b8c097 in PMPI_Comm_size () from /usr/local/openmpi-2.1.0-intel-v15.0.2.164/lib/libmpi.so.20
(gdb) bt
#0  0x00002aaab0b8c097 in PMPI_Comm_size () from /usr/local/openmpi-2.1.0-intel-v15.0.2.164/lib/libmpi.so.20
#1  0x00002aaab0129482 in ilp64_Cblacs_pinfo () from /opt/intel/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_blacs_intelmpi_ilp64.so
#2  0x00002aaab011b759 in ilp64_blacs_gridmap_ () from /opt/intel/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_blacs_intelmpi_ilp64.so
#3  0x00002aaab011b151 in ilp64_blacs_gridinit_ () from /opt/intel/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_blacs_intelmpi_ilp64.so
#4  0x00002aaab014384b in blacs_gridinit__ () from /opt/intel/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_blacs_intelmpi_ilp64.so
#5  0x000000000043cc33 in procmap (comm=Cannot access memory at address 0x44000000
) at scala.F:583
#6  init_scala_desc (comm=Cannot access memory at address 0x44000000
) at scala.F:413
#7  init_scala (comm=Cannot access memory at address 0x44000000
) at scala.F:321
#8  scala::pdssyex_zheevx (comm=Cannot access memory at address 0x44000000
) at scala.F:847
#9  0x0000000000e427c0 in david::eddav (hamiltonian=Cannot access memory at address 0x44000000
) at davidson.F:863
#10 0x0000000000eacd36 in elmin (hamiltonian=Cannot access memory at address 0x44000000
) at electron.F:424
#11 0x00000000015582da in electronic_optimization () at main.F:4745
#12 0x00000000015359b5 in vamp () at main.F:2792
#13 0x000000000040989e in main ()
(gdb)
------------------------------------------------------------------------------------------------------------------------

Here is the result of ldd:

Code: Select all

        linux-vdso.so.1 =>  (0x00002b3b6fe09000)
	libmkl_intel_lp64.so => /opt/intel/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_intel_lp64.so (0x00002b3b6fe0c000)
	libmkl_intel_thread.so => /opt/intel/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_intel_thread.so (0x00002b3b7071f000)
	libmkl_core.so => /opt/intel/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_core.so (0x00002b3b71b40000)
	libiomp5.so => /opt/intel/composer_xe_2015.2.164/compiler/lib/intel64/libiomp5.so (0x00002b3b7369f000)
	libmkl_intel_ilp64.so => /opt/intel/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_intel_ilp64.so (0x00002b3b739db000)
	libmkl_scalapack_ilp64.so => /opt/intel/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_scalapack_ilp64.so (0x00002b3b7428e000)
	libmkl_sequential.so => /opt/intel/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_sequential.so (0x00002b3b74b8c000)
	libmkl_blacs_intelmpi_ilp64.so => /opt/intel/composer_xe_2015.2.164/mkl/lib/intel64/libmkl_blacs_intelmpi_ilp64.so (0x00002b3b75463000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00000038d8000000)
	libm.so.6 => /lib64/libm.so.6 (0x00000038d7800000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00000038d7c00000)
	libstdc++.so.6 => /usr/lib64/libstdc++.so.6 (0x00000038de800000)
	libmpi_usempif08.so.20 => /usr/local/openmpi-2.1.0-intel-v15.0.2.164/lib/libmpi_usempif08.so.20 (0x00002b3b756f8000)
	libmpi_usempi_ignore_tkr.so.20 => /usr/local/openmpi-2.1.0-intel-v15.0.2.164/lib/libmpi_usempi_ignore_tkr.so.20 (0x00002b3b759a6000)
	libmpi_mpifh.so.20 => /usr/local/openmpi-2.1.0-intel-v15.0.2.164/lib/libmpi_mpifh.so.20 (0x00002b3b75c2e000)
	libmpi.so.20 => /usr/local/openmpi-2.1.0-intel-v15.0.2.164/lib/libmpi.so.20 (0x00002b3b75e8b000)
	libc.so.6 => /lib64/libc.so.6 (0x00000038d7400000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00000038de000000)
	/lib64/ld-linux-x86-64.so.2 (0x00000038d7000000)
	libopen-rte.so.20 => /usr/local/openmpi-2.1.0-intel-v15.0.2.164/lib/libopen-rte.so.20 (0x00002b3b76197000)
	libopen-pal.so.20 => /usr/local/openmpi-2.1.0-intel-v15.0.2.164/lib/libopen-pal.so.20 (0x00002b3b76431000)
	librt.so.1 => /lib64/librt.so.1 (0x00000038d8800000)
	libutil.so.1 => /lib64/libutil.so.1 (0x00000038e6c00000)
	libifport.so.5 => /opt/intel/composer_xe_2015.2.164/compiler/lib/intel64/libifport.so.5 (0x00002b3b76750000)
	libimf.so => /opt/intel/composer_xe_2015.2.164/compiler/lib/intel64/libimf.so (0x00002b3b7697e000)
	libintlc.so.5 => /opt/intel/composer_xe_2015.2.164/compiler/lib/intel64/libintlc.so.5 (0x00002b3b76e39000)
	libsvml.so => /opt/intel/composer_xe_2015.2.164/compiler/lib/intel64/libsvml.so (0x00002b3b77094000)
	libirng.so => /opt/intel/composer_xe_2015.2.164/compiler/lib/intel64/libirng.so (0x00002b3b77f68000)
------------------------------------------------------------------------------------------------------------------------

Does anyone know whether it is the problem with openmpi (sigsegv was dropped by libmpi.so.20) or rather by MKL?
Help, please! :-)
Regards,
Witold

Re: version 5.4.4, segmentation fault

Posted: Sun Jun 17, 2018 8:31 pm
by wpiskorz
Dear All,

Some news:
I tried to compile VASP with impi (again with debug flag) but the runtime error (sigsegv) persists. The error message is:

{ 0, 0}: On entry to
DESCINIT parameter number 6 had an illegal value
internal error in INIT_SCALA: DESCA, DESCINIT, INFO: -6

So I resolved to undefine the scalapack (removed -DscaLAPACK) and now VASP seems to work properly (still testing). I have no idea whether the speed penalty on commenting on scalapack is important. Can anyone comment it, please?

With best regards,
Witold