error? in automatic determination of ntaupar in rpar calculations

Message

kdoblhoff · #1 Post by **kdoblhoff** » Wed Mar 30, 2022 1:47 pm

I am performing an rpa calculation of a bulk metal in vasp 6.3 (one atom in the unit cell, k-point grid of 14x14x14) using the finite temperature, low scaling algorithm.

When not specifying nomegapar and ntaupar, I get the following output (note the choice of ntaupar, the required memory and the memory warning)):

Code: Select all

 NOMEGAPAR set to  8 based on MAXMEM
 MEMORY estimate per rank in MB for (frequency):       13.78

Looking for optimal time points distribution:
 NTAUPAR    req. mB/rank(max)    req. mb/rank(min)
       8              7755.1              7385.8
       4              8489.1              8084.8
       2              9474.8              9023.6
       1             10583.8             10079.8
       
[...]

|     This job will probably crash, due to insufficient memory available.     |
|     Available memory per mpi rank: 7300 MB, required memory: 11818 MB.      |
|     Reducing NTAUPAR or using more computing nodes might solve this         |
|     problem.                                                                |

[...]

 Maximum memory used (kb):     7204868.

The code runs through either because it actually ends up using less memory than anticipated (as suggested by the "maximum memory used" statement at the end of the OUTCAR) or because the calculations are run on 1/4 or a node and there actually IS more memory available - just that that should remain available for other jobs running on the same node.

There are 2 things though that I do not understand:

Why does the memory usage INCREASE as ntaupar decreases (see output above)
I was astonished about the memory warning as someone in my group had successfully performed the same calculation on a cluster with less memory using vasp 6.2 - without getting a warning. We checked his output and found that the code set NTAUPAR to 2 and NOMEGAPAR to 4. I set that manually and - voilà - the calculation ran through without a warning, giving also a lower total memory requirement, however using about 5% more computational time. Clearly, the reduced memory is consistent with point 1, but if it is really true that the memory requirement increases in this case as NTAUPAR decreases, then why does vasp choose the option that uses MOST memory, risking a crash due to too little memory, as suggested by the memory warning?

Thank you and best regards,
Katharina

#2 Post by **merzuk.kaltak** » Wed Mar 30, 2022 2:08 pm

Dear Katharina,

This might be a bug. It is expected that memory requirement goes up with NTAUPAR. Output like the following is "normal":

Code: Select all

 NTAUPAR    req. mB/rank(max)    req. mb/rank(min)
      11             10130.1              9647.7
       2              3105.7              2957.8
       1              2331.1              2220.1

However, to be sure I would need INCAR, POSCAR, POTCAR, KPOINTS and the OUTCAR of the problematic job (basically a proper bug report).

kdoblhoff · #3 Post by **kdoblhoff** » Wed Mar 30, 2022 3:17 pm

Thank you for the reply,
Here comes the full report:
Code: vasp.6.2.1 16May21 (build Oct 14 2021 10:26:40) complex
Compilation info as far as I can access it (vasp is preinstalled on the national supercomputer): foss-2021a-CUDA-11.3.1
Executed on: 1 node with 128 cpu and 1TB of memory in total, using 32 cpus.
Input:
INCAR:

Code: Select all

 a3.95773

 LWAVE = .FALSE.
 LCHARG = .FALSE.

 ALGO = RPAR
  LFINITE_TEMPERATURE = .TRUE.
 NOMEGA = 12
 ENCUT = 300
 EDIFF = 1e-08

 ISMEAR = -1
 SIGMA = 0.05


 LASPH = .TRUE.

 MAXMEM = 7300

KPOINTS:

Code: Select all

kpoints
0
Gammacentered
14 14 14
0.0 0.0 0.0

POSCAR:

Code: Select all

Al
 1.0000000000000000
     0.0000000000000000    1.9788667961468493    1.9788667961468493
     1.9788667961468493    0.0000000000000000    1.9788667961468493
     1.9788667961468493    1.9788667961468493    0.0000000000000000
 Al
   1
Cartesian
  0.0000000000000000  0.0000000000000000  0.0000000000000000

POTCAR: PAW Al_GW 19Mar2012

Thank you for taking a look!

#4 Post by **merzuk.kaltak** » Fri Apr 01, 2022 12:27 pm

Dear Katharina,

I have tried to reproduce this behavior on an AMD based machine with 128 cores and 512GB of memory (largest node we have) running with 32 MPI ranks.
I realized that the estimated memory requirement is underestimated by roughly 120 GB, which is certainly a bug.
This is the output I have for instance with your INCAR

Code: Select all

 NTAUPAR    req. mB/rank(max)    req. mb/rank(min)
       8              7713.9              7346.5
       4              7225.6              6881.5

So, the code picks NTAUPAR = 4 for this run and estimates roughly 7300MB of storage requirement for one rank.
When comparing this with the measured memory peak (at the end of the OUTCAR) I have:

Code: Select all

                   Maximum memory used (kb):    11516168.

This is about 4000 MB more per rank than estimated. It seems that the estimated RAM is valid only for a subset of ranks, but not for the first MPI rank (which always requires the largest amount of RAM).
I will try to fix this.

In the mean time I suggest you set NTAUPAR manually in the INCAR.
Note, NTAUPAR = 1 will be the slowest configuration and will use the smallest amount of memory possible for any RPAR job.

However, I do not observe that the calculated memory requirement increases with decreasing NTAUPAR.
To understand why this is so, it would be helpful, if you could post the content of "/proc/meminfo" on that particular machine having 1TB of storage.
This files shows the Available Memory on that machine, specifically, a line that looks like so:

Code: Select all

MemFree:        12292892 kB
MemAvailable:   25677364 kB

I would like to know if the unit kB is still used on a 1TB machine.
Also, please let me know which compiler toolchain is used to compile VASP on that machine and how the "makefile.include" looks like.

#5 Post by **merzuk.kaltak** » Mon Apr 04, 2022 2:54 pm

Dear Katharina,

just want to let you know that in our latest version 6.3.1 the required memory is calculated correctly.
For instance, using the same inputfiles as above, the estimated memory requirement is 7126 Mb per rank and
maximum memory used (kb) is 6093604.
Here is the measured memory footprint, showing how much memory is consumed during the RPAR calculation.

mem-ntaupar4.png

kdoblhoff · #6 Post by **kdoblhoff** » Mon Apr 11, 2022 6:56 am

Dear Merzuk,
Sorry for the delayed reply. I attach the makefile used on our cluster.
I have a question concerning your (interim) solution. You suggest setting ntaupar to 1, which would have been my first impulse too, but this is at odds with the fact that the calculation in which I manually set NTAUPAR = 2 and NOMEGAPAR = 4 ran through without a warning, while the one which automatically choses NTAUPAR = 1 did not. Compared to NTAUPAR = 1, the maximum memory reported also decreases for NTAUPAR = 2 and NOMEGAPAR = 4.

Concerning your other questions:
The memory is reported in kb on /proc/meminfo:

Code: Select all

-----------start meminfo----------
MemTotal:       1056640340 kB
MemFree:        810398244 kB
MemAvailable:   979494768 kB

The following makefile was used:

Code: Select all

# Precompiler options
CPP_OPTIONS= -DHOST=\"LinuxGNU\" \
             -DMPI -DMPI_BLOCK=8000 -Duse_collective \
             -DscaLAPACK \
             -DCACHE_SIZE=4000 \
             -Davoidalloc \
             -Dvasp6 \
             -Duse_bse_te \
             -Dtbdyn \
             -Dfock_dblbuf

CPP        = gcc -E -P -C -w $*$(FUFFIX) >$*$(SUFFIX) $(CPP_OPTIONS)

FC         = mpif90
FCL        = mpif90

FREE       = -ffree-form -ffree-line-length-none

FFLAGS     = -w -march=native -fallow-argument-mismatch
OFLAG      = -O2
OFLAG_IN   = $(OFLAG)
DEBUG      = -O0

BLAS       = -L$(EBROOTOPENBLAS)/lib -lopenblas
LAPACK     =
BLACS      = 
SCALAPACK  = -L$(EBROOTSCALAPACK)/lib -lscalapack $(BLACS)

LLIBS      = $(SCALAPACK) $(LAPACK) $(BLAS)

FFTW       ?= $(EBROOTFFTW)
LLIBS      += -L$(FFTW)/lib -lfftw3
INCS       = -I$(FFTW)/include

OBJECTS    = fftmpiw.o fftmpi_map.o  fftw3d.o  fft3dlib.o

OBJECTS_O1 += fftw3d.o fftmpi.o fftmpiw.o
OBJECTS_O2 += fft3dlib.o

# For what used to be vasp.5.lib
CPP_LIB    = $(CPP)
FC_LIB     = $(FC)
CC_LIB     = gcc
CFLAGS_LIB = -O
FFLAGS_LIB = -O1
FREE_LIB   = $(FREE)

OBJECTS_LIB= linpack_double.o getshmem.o

# For the parser library
CXX_PARS   = g++
LLIBS      += -lstdc++


# Normally no need to change this
SRCDIR     = ../../src
BINDIR     = ../../bin

#================================================
# GPU Stuff

CPP_GPU    = -DCUDA_GPU -DRPROMU_CPROJ_OVERLAP -DCUFFT_MIN=28 -UscaLAPACK -Ufock_dblbuf # -DUSE_PINNED_MEMORY 

OBJECTS_GPU= fftmpiw.o fftmpi_map.o fft3dlib.o fftw3d_gpu.o fftmpiw_gpu.o

CC         = gcc
CXX        = g++
CFLAGS     = -fPIC -DADD_ -fopenmp -DMAGMA_WITH_MKL -DMAGMA_SETAFFINITY -DGPUSHMEM=300 -DHAVE_CUBLAS

# Minimal requirement is CUDA >= 10.X. For "sm_80" you need CUDA >= 11.X.
CUDA_ROOT  ?= $(EBROOTCUDACORE)
NVCC       := $(CUDA_ROOT)/bin/nvcc
CUDA_LIB   := -L$(CUDA_ROOT)/lib64 -lnvToolsExt -lcudart -lcuda -lcufft -lcublas

GENCODE_ARCH    := -gencode=arch=compute_35,code=\"sm_35,compute_35\" \
                   -gencode=arch=compute_60,code=\"sm_60,compute_60\" \
                   -gencode=arch=compute_70,code=\"sm_70,compute_70\" \
                   -gencode=arch=compute_75,code=\"sm_75,compute_75\"

MPI_INC    = $(EBROOTOPENMPI)/include

Thank you for having a look,
Katharina

#7 Post by **merzuk.kaltak** » Mon Apr 11, 2022 8:09 am

Dear Katharina,

Thank you for the report.
Of course you can use NTAUPAR=2, if the calculation fits in memory.
This bug should be removed in 6.3.0 and the memory prediction should be accurate.
Could you run the job with vasp 6.3.0 on your cluster?

kdoblhoff · #8 Post by **kdoblhoff** » Tue Apr 19, 2022 12:35 pm

Dear Merzuk,
I can confirm that the bug is gone when using VASP6.3 (similar build, same machine). Thank you for the fix!
Best regards,
Katharina
G

My Community

error? in automatic determination of ntaupar in rpar calculations

error? in automatic determination of ntaupar in rpar calculations

Re: error? in automatic determination of ntaupar in rpar calculations

Re: error? in automatic determination of ntaupar in rpar calculations

Re: error? in automatic determination of ntaupar in rpar calculations

Re: error? in automatic determination of ntaupar in rpar calculations

Re: error? in automatic determination of ntaupar in rpar calculations

Re: error? in automatic determination of ntaupar in rpar calculations

Re: error? in automatic determination of ntaupar in rpar calculations