Issue with NPAR

Message

anandverma · #1 Post by **anandverma** » Wed Mar 15, 2023 9:02 am

Hi,

With the newly compiled VASP executable, whenever I put a NPAR value in INCAR, I get errors like the following. If I don't put any NPAR value, my simulation runs smoothly but at the costs of 1/6-1/10 speed. I asked the HPC guys, they say that the fault is not from their end.

Error with NPAR command
==========================================================================
[cn275:2585 :0:2585] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xfffffffc96ba5a80)
Abort(68273934) on node 24 (rank 24 in comm 0): Fatal error in PMPI_Recv: Message truncated, error stack:
PMPI_Recv(173)........: MPI_Recv(buf=0xc29b120, count=0, MPI_BYTE, src=94, tag=9, comm=0xc4000011, status=0x7ffeae862bc0) failed
MPID_Recv(590)........:
MPIDI_recv_unsafe(205):
(unknown)(): Message truncated
==========================================================================

script.sh looks like below:
==========================================================================
#!/bin/csh
#SBATCH --job-name=test
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=48
#SBATCH --time=03:25:00
#SBATCH --output=slurm-%A.out
#SBATCH --error=slurm-%A.err
#SBATCH --partition=small

cd $SLURM_SUBMIT_DIR

source /opt/ohpc/admin/lmod/8.1.18/init/csh
module load spack
source /home-ext/apps/spack/share/spack/setup-env.csh
spack load intel-mkl@2020.4.304 /fet6h2j
spack load intel-mpi@2019.10.317 /6icwzn3
spack load fftw@3.3.10%intel@2021.5.0 /tphl5ba

unlimit
#ulimit -s unlimited

mpiexec.hydra -n $SLURM_NTASKS /home/proj/21/chemoh/vasp/vasp.5.4.4.pl2/bin/vasp_std
==========================================================================

And the INCAR looks like this:
==========================================================================
# Parameters related to accuracy of the simulation
PREC= Normal # Precision of the calculation
ALGO= Normal # Selects the block Davidson diagonalization algorithm
LREAL= .FALSE. # Evaluation of projection operators in reciprocal space
ENCUT= 400 # Plane wave cutoff in eV
EDIFF= 1E-5 # Converge DFT energy till variation is less than 1E-5 eV

# Accelerating convergence through electronic smearing
ISMEAR= 1 # Gaussian smearing to accelerate convergence
SIGMA= 0.2 # Width of Gaussian for smearing electronic distribution

# Spin polarization setting
ISPIN= 2 # Spin-polarized calculation, i.e., taking spin into account
#LDIPOL = .TRUE.
#IDIPOL=3

# Output settings
LCHARG= .TRUE. # Do not write CHGCAR
LWAVE= .TRUE. # Do not write WAVECAR

# Parallelization options
NPAR= 12 # Number of bands that are treated in parallel; NPAR ~ sqrt(number of cores)

# Exchange-correlation functional settings
GGA= PE # Chooses PBE XC functional
IVDW= 12 # Adds dispersion to DFT using Grimme's D3 method, with Becke-Johnson (BJ) damping, see: 10.1021/jp501237c

# Cell optimization details
IBRION= 2 # Optimize ion positions
EDIFFG= -0.05 # Stop optimization if forces on all atoms are less than 0.01 eV/A
NSW= 500 # Number of optimization steps to carry out
==========================================================================

#2 Post by **henrique_miranda** » Wed Mar 15, 2023 10:02 am

Could you share the POSCAR, KPOINTS and POTCAR file as well?
You say that without NPAR the code runs without problem but slower, to wich run are you comparing to?
Did you try using different values of NPAR?

anandverma · #3 Post by **anandverma** » Thu Mar 16, 2023 5:34 am

Hi,

I had accidentally deleted some files so I ran a simulation once again. Please find attached all the relevant files in the zip folder.

#4 Post by **henrique_miranda** » Thu Mar 16, 2023 7:18 am

Looking at your OUTCAR I see the following error:

Code: Select all

Intel MKL ERROR: Parameter 6 was incorrect on entry to DSTEIN2

An online search shows other people having the same issue:
https://community.intel.com/t5/Intel-on ... -p/1143852

From the version of your MKL module (intel-mkl@2020.4.304) and judging by the recommendation of the link above I would guess the problem is already fixed.
Perhaps your vasp executable is using another MKL library module than the one you are loading through the module.
You can check that using:
ldd <path to vasp_std>/vasp_std

You might also consider recompiling VASP and linking with your own compiled version of SCALAPACK:
https://github.com/Reference-ScaLAPACK/scalapack/

anandverma · #5 Post by **anandverma** » Thu Mar 16, 2023 10:52 am

Hi,

When I use ldd /home/proj/21/chemoh/vasp/vasp.5.4.4.pl2/bin/vasp_std, I find the following:

linux-vdso.so.1 => (0x00007ffe8e3c4000)
libmkl_intel_lp64.so => not found
libmkl_sequential.so => not found
libmkl_core.so => not found
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fb47dc97000)
libmkl_blacs_intelmpi_lp64.so => not found
libmkl_scalapack_lp64.so => not found
libmpifort.so.12 => /opt/ohpc/pub/compiler/intel/2018_2/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/libmpifort.so.12 (0x00007fb47d8ee000)
libmpi.so.12 => /opt/ohpc/pub/compiler/intel/2018_2/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/release_mt/libmpi.so.12 (0x00007fb47cc69000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fb47ca65000)
librt.so.1 => /lib64/librt.so.1 (0x00007fb47c85d000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fb47c641000)
libm.so.6 => /lib64/libm.so.6 (0x00007fb47c33f000)
libc.so.6 => /lib64/libc.so.6 (0x00007fb47bf71000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fb47bd5b000)
/lib64/ld-linux-x86-64.so.2 (0x00007fb47df9f000)

The link given says to use mkl after 2019.1 but I already use intel-mkl@2020.4.304. Can you identify the problem? Am I missing something?

#6 Post by **henrique_miranda** » Thu Mar 16, 2023 3:19 pm

The output of your ldd command is not showing what libraries are being used:
libmkl_scalapack_lp64.so => not found
If you are on a cluster you need to run this command on a node where the modules are available.
I don't know your setup but I would guess a compute node.

I would also think that with the MKL you are using the issue would be solved but you are still getting the error.
My advice would be to try linking against your own compiled version of SCALAPACK.

My Community

Issue with NPAR

Issue with NPAR

Re: Issue with NPAR

Re: Issue with NPAR

Re: Issue with NPAR

Re: Issue with NPAR

Re: Issue with NPAR