Issue with NPAR
Moderators: Global Moderator, Moderator
-
- Newbie
- Posts: 3
- Joined: Thu Mar 04, 2021 2:48 pm
Issue with NPAR
Hi,
With the newly compiled VASP executable, whenever I put a NPAR value in INCAR, I get errors like the following. If I don't put any NPAR value, my simulation runs smoothly but at the costs of 1/6-1/10 speed. I asked the HPC guys, they say that the fault is not from their end.
Error with NPAR command
==========================================================================
[cn275:2585 :0:2585] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xfffffffc96ba5a80)
Abort(68273934) on node 24 (rank 24 in comm 0): Fatal error in PMPI_Recv: Message truncated, error stack:
PMPI_Recv(173)........: MPI_Recv(buf=0xc29b120, count=0, MPI_BYTE, src=94, tag=9, comm=0xc4000011, status=0x7ffeae862bc0) failed
MPID_Recv(590)........:
MPIDI_recv_unsafe(205):
(unknown)(): Message truncated
==========================================================================
script.sh looks like below:
==========================================================================
#!/bin/csh
#SBATCH --job-name=test
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=48
#SBATCH --time=03:25:00
#SBATCH --output=slurm-%A.out
#SBATCH --error=slurm-%A.err
#SBATCH --partition=small
cd $SLURM_SUBMIT_DIR
source /opt/ohpc/admin/lmod/8.1.18/init/csh
module load spack
source /home-ext/apps/spack/share/spack/setup-env.csh
spack load intel-mkl@2020.4.304 /fet6h2j
spack load intel-mpi@2019.10.317 /6icwzn3
spack load fftw@3.3.10%intel@2021.5.0 /tphl5ba
unlimit
#ulimit -s unlimited
mpiexec.hydra -n $SLURM_NTASKS /home/proj/21/chemoh/vasp/vasp.5.4.4.pl2/bin/vasp_std
==========================================================================
And the INCAR looks like this:
==========================================================================
# Parameters related to accuracy of the simulation
PREC= Normal # Precision of the calculation
ALGO= Normal # Selects the block Davidson diagonalization algorithm
LREAL= .FALSE. # Evaluation of projection operators in reciprocal space
ENCUT= 400 # Plane wave cutoff in eV
EDIFF= 1E-5 # Converge DFT energy till variation is less than 1E-5 eV
# Accelerating convergence through electronic smearing
ISMEAR= 1 # Gaussian smearing to accelerate convergence
SIGMA= 0.2 # Width of Gaussian for smearing electronic distribution
# Spin polarization setting
ISPIN= 2 # Spin-polarized calculation, i.e., taking spin into account
#LDIPOL = .TRUE.
#IDIPOL=3
# Output settings
LCHARG= .TRUE. # Do not write CHGCAR
LWAVE= .TRUE. # Do not write WAVECAR
# Parallelization options
NPAR= 12 # Number of bands that are treated in parallel; NPAR ~ sqrt(number of cores)
# Exchange-correlation functional settings
GGA= PE # Chooses PBE XC functional
IVDW= 12 # Adds dispersion to DFT using Grimme's D3 method, with Becke-Johnson (BJ) damping, see: 10.1021/jp501237c
# Cell optimization details
IBRION= 2 # Optimize ion positions
EDIFFG= -0.05 # Stop optimization if forces on all atoms are less than 0.01 eV/A
NSW= 500 # Number of optimization steps to carry out
==========================================================================
With the newly compiled VASP executable, whenever I put a NPAR value in INCAR, I get errors like the following. If I don't put any NPAR value, my simulation runs smoothly but at the costs of 1/6-1/10 speed. I asked the HPC guys, they say that the fault is not from their end.
Error with NPAR command
==========================================================================
[cn275:2585 :0:2585] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xfffffffc96ba5a80)
Abort(68273934) on node 24 (rank 24 in comm 0): Fatal error in PMPI_Recv: Message truncated, error stack:
PMPI_Recv(173)........: MPI_Recv(buf=0xc29b120, count=0, MPI_BYTE, src=94, tag=9, comm=0xc4000011, status=0x7ffeae862bc0) failed
MPID_Recv(590)........:
MPIDI_recv_unsafe(205):
(unknown)(): Message truncated
==========================================================================
script.sh looks like below:
==========================================================================
#!/bin/csh
#SBATCH --job-name=test
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=48
#SBATCH --time=03:25:00
#SBATCH --output=slurm-%A.out
#SBATCH --error=slurm-%A.err
#SBATCH --partition=small
cd $SLURM_SUBMIT_DIR
source /opt/ohpc/admin/lmod/8.1.18/init/csh
module load spack
source /home-ext/apps/spack/share/spack/setup-env.csh
spack load intel-mkl@2020.4.304 /fet6h2j
spack load intel-mpi@2019.10.317 /6icwzn3
spack load fftw@3.3.10%intel@2021.5.0 /tphl5ba
unlimit
#ulimit -s unlimited
mpiexec.hydra -n $SLURM_NTASKS /home/proj/21/chemoh/vasp/vasp.5.4.4.pl2/bin/vasp_std
==========================================================================
And the INCAR looks like this:
==========================================================================
# Parameters related to accuracy of the simulation
PREC= Normal # Precision of the calculation
ALGO= Normal # Selects the block Davidson diagonalization algorithm
LREAL= .FALSE. # Evaluation of projection operators in reciprocal space
ENCUT= 400 # Plane wave cutoff in eV
EDIFF= 1E-5 # Converge DFT energy till variation is less than 1E-5 eV
# Accelerating convergence through electronic smearing
ISMEAR= 1 # Gaussian smearing to accelerate convergence
SIGMA= 0.2 # Width of Gaussian for smearing electronic distribution
# Spin polarization setting
ISPIN= 2 # Spin-polarized calculation, i.e., taking spin into account
#LDIPOL = .TRUE.
#IDIPOL=3
# Output settings
LCHARG= .TRUE. # Do not write CHGCAR
LWAVE= .TRUE. # Do not write WAVECAR
# Parallelization options
NPAR= 12 # Number of bands that are treated in parallel; NPAR ~ sqrt(number of cores)
# Exchange-correlation functional settings
GGA= PE # Chooses PBE XC functional
IVDW= 12 # Adds dispersion to DFT using Grimme's D3 method, with Becke-Johnson (BJ) damping, see: 10.1021/jp501237c
# Cell optimization details
IBRION= 2 # Optimize ion positions
EDIFFG= -0.05 # Stop optimization if forces on all atoms are less than 0.01 eV/A
NSW= 500 # Number of optimization steps to carry out
==========================================================================
-
- Global Moderator
- Posts: 501
- Joined: Mon Nov 04, 2019 12:41 pm
- Contact:
Re: Issue with NPAR
Could you share the POSCAR, KPOINTS and POTCAR file as well?
You say that without NPAR the code runs without problem but slower, to wich run are you comparing to?
Did you try using different values of NPAR?
You say that without NPAR the code runs without problem but slower, to wich run are you comparing to?
Did you try using different values of NPAR?
-
- Newbie
- Posts: 3
- Joined: Thu Mar 04, 2021 2:48 pm
Re: Issue with NPAR
Hi,
I had accidentally deleted some files so I ran a simulation once again. Please find attached all the relevant files in the zip folder.
I had accidentally deleted some files so I ran a simulation once again. Please find attached all the relevant files in the zip folder.
You do not have the required permissions to view the files attached to this post.
-
- Global Moderator
- Posts: 501
- Joined: Mon Nov 04, 2019 12:41 pm
- Contact:
Re: Issue with NPAR
Looking at your OUTCAR I see the following error:
An online search shows other people having the same issue:
https://community.intel.com/t5/Intel-on ... -p/1143852
From the version of your MKL module (intel-mkl@2020.4.304) and judging by the recommendation of the link above I would guess the problem is already fixed.
Perhaps your vasp executable is using another MKL library module than the one you are loading through the module.
You can check that using:
ldd <path to vasp_std>/vasp_std
You might also consider recompiling VASP and linking with your own compiled version of SCALAPACK:
https://github.com/Reference-ScaLAPACK/scalapack/
Code: Select all
Intel MKL ERROR: Parameter 6 was incorrect on entry to DSTEIN2
https://community.intel.com/t5/Intel-on ... -p/1143852
From the version of your MKL module (intel-mkl@2020.4.304) and judging by the recommendation of the link above I would guess the problem is already fixed.
Perhaps your vasp executable is using another MKL library module than the one you are loading through the module.
You can check that using:
ldd <path to vasp_std>/vasp_std
You might also consider recompiling VASP and linking with your own compiled version of SCALAPACK:
https://github.com/Reference-ScaLAPACK/scalapack/
-
- Newbie
- Posts: 3
- Joined: Thu Mar 04, 2021 2:48 pm
Re: Issue with NPAR
Hi,
When I use ldd /home/proj/21/chemoh/vasp/vasp.5.4.4.pl2/bin/vasp_std, I find the following:
linux-vdso.so.1 => (0x00007ffe8e3c4000)
libmkl_intel_lp64.so => not found
libmkl_sequential.so => not found
libmkl_core.so => not found
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fb47dc97000)
libmkl_blacs_intelmpi_lp64.so => not found
libmkl_scalapack_lp64.so => not found
libmpifort.so.12 => /opt/ohpc/pub/compiler/intel/2018_2/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/libmpifort.so.12 (0x00007fb47d8ee000)
libmpi.so.12 => /opt/ohpc/pub/compiler/intel/2018_2/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/release_mt/libmpi.so.12 (0x00007fb47cc69000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fb47ca65000)
librt.so.1 => /lib64/librt.so.1 (0x00007fb47c85d000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fb47c641000)
libm.so.6 => /lib64/libm.so.6 (0x00007fb47c33f000)
libc.so.6 => /lib64/libc.so.6 (0x00007fb47bf71000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fb47bd5b000)
/lib64/ld-linux-x86-64.so.2 (0x00007fb47df9f000)
The link given says to use mkl after 2019.1 but I already use intel-mkl@2020.4.304. Can you identify the problem? Am I missing something?
When I use ldd /home/proj/21/chemoh/vasp/vasp.5.4.4.pl2/bin/vasp_std, I find the following:
linux-vdso.so.1 => (0x00007ffe8e3c4000)
libmkl_intel_lp64.so => not found
libmkl_sequential.so => not found
libmkl_core.so => not found
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007fb47dc97000)
libmkl_blacs_intelmpi_lp64.so => not found
libmkl_scalapack_lp64.so => not found
libmpifort.so.12 => /opt/ohpc/pub/compiler/intel/2018_2/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/libmpifort.so.12 (0x00007fb47d8ee000)
libmpi.so.12 => /opt/ohpc/pub/compiler/intel/2018_2/compilers_and_libraries_2018.2.199/linux/mpi/intel64/lib/release_mt/libmpi.so.12 (0x00007fb47cc69000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fb47ca65000)
librt.so.1 => /lib64/librt.so.1 (0x00007fb47c85d000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fb47c641000)
libm.so.6 => /lib64/libm.so.6 (0x00007fb47c33f000)
libc.so.6 => /lib64/libc.so.6 (0x00007fb47bf71000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007fb47bd5b000)
/lib64/ld-linux-x86-64.so.2 (0x00007fb47df9f000)
The link given says to use mkl after 2019.1 but I already use intel-mkl@2020.4.304. Can you identify the problem? Am I missing something?
-
- Global Moderator
- Posts: 501
- Joined: Mon Nov 04, 2019 12:41 pm
- Contact:
Re: Issue with NPAR
The output of your ldd command is not showing what libraries are being used:
libmkl_scalapack_lp64.so => not found
If you are on a cluster you need to run this command on a node where the modules are available.
I don't know your setup but I would guess a compute node.
I would also think that with the MKL you are using the issue would be solved but you are still getting the error.
My advice would be to try linking against your own compiled version of SCALAPACK.
libmkl_scalapack_lp64.so => not found
If you are on a cluster you need to run this command on a node where the modules are available.
I don't know your setup but I would guess a compute node.
I would also think that with the MKL you are using the issue would be solved but you are still getting the error.
My advice would be to try linking against your own compiled version of SCALAPACK.