Performance loss due to context switching

Problems running VASP: crashes, internal errors, "wrong" results.


Moderators: Global Moderator, Moderator

Locked
Message
Author
jun
Newbie
Newbie
Posts: 2
Joined: Fri Jun 24, 2016 4:08 am
License Nr.: 5-2411

Performance loss due to context switching

#1 Post by jun » Wed Aug 10, 2016 3:50 am

Hi all,

VASP sometime just gets much slower randomly on our clusters. By looking at the OUTCAR timing info I noticed that for the slow jobs Voluntary context switches are so high. At first I suspected that it might relate to MKL routines spawning too many threads, so I compiled VASP again with sequential MKL and I also tried explicitly export MKL_NUM_THREAD=1 to see if it could be better. However, the oversubscription still persist. I don't know whether this is because of my compilation or the setting of our clusters.

Here is the makefile.include I used:
# Precompiler options
CPP_OPTIONS= -DMPI -DHOST=\"IFC91_ompi_phoenix\" -DIFC \
-DCACHE_SIZE=16000 -DPGF90 -Davoidalloc \
-DMPI_BLOCK=8000 -Duse_collective \
-DnoAugXCmeta -Duse_bse_te \
-Duse_shmem -Dtbdyn

CPP = fpp -f_com=no -free -w0 $*$(FUFFIX) $*$(SUFFIX) $(CPP_OPTIONS)

# Changed to libmkl_sequential.a
FC = mpifort -I${MKLROOT}/include
FCL = mpifort -mkl=sequential -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a \
${MKLROOT}/lib/intel64/libmkl_core.a \
${MKLROOT}/lib/intel64/libmkl_sequential.a -Wl,--end-group

FREE = -free -names lowercase

FFLAGS = -assume byterecl -heap-arrays 64
OFLAG = -O2
OFLAG_IN = $(OFLAG)
DEBUG = -O0

MKL_PATH = $(MKLROOT)/lib/intel64
BLAS =
LAPACK =
BLACS =
SCALAPACK =

OBJECTS = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o \
/home/a1692208/vasp5.4/fftw3xf/libfftw3xf_intel.a
INCS =-I$(MKLROOT)/include/fftw

LLIBS = $(SCALAPACK) $(LAPACK) $(BLAS) -lpthread -lm -ldl

OBJECTS_O1 += fft3dfurth.o fftw3d.o fftmpi.o fftmpiw.o
OBJECTS_O2 += fft3dlib.o

# For what used to be vasp.5.lib
CPP_LIB = $(CPP)
FC_LIB = $(FC)
CC_LIB = icc
CFLAGS_LIB = -O
FFLAGS_LIB = -O1
FREE_LIB = $(FREE)

OBJECTS_LIB= linpack_double.o getshmem.o

# Normally no need to change this
SRCDIR = ../../src
BINDIR = ../../bin
Our clusters run SLURM I think computational resources are assigned automatically and shouldn't be any problem. Am I right? Does anyone have experience avoiding oversubscribe?

Thanks in advance.

Jun.

support_vasp
Global Moderator
Global Moderator
Posts: 1817
Joined: Mon Nov 18, 2019 11:00 am

Re: Performance loss due to context switching

#2 Post by support_vasp » Tue Sep 10, 2024 2:44 pm

Hi,

We're sorry that we didn’t answer your question. This does not live up to the quality of support that we aim to provide. The team has since expanded. If we can still help with your problem, please ask again in a new post, linking to this one, and we will answer as quickly as possible.

Best wishes,

VASP


Locked