Page 1 of 1
VASP 6.3 ACC OMP problem
Posted: Thu Feb 03, 2022 7:50 am
by Dankomaister
Hi,
I have compiled VASP 6.3 using the NVHPC compilers and this makefile
Code: Select all
makefile.include.nvhpc_ompi_mkl_omp_acc
It runs fine if I disable openmp
but with openmp for example
VASP freezes after "entering main loop".
I have verified this problem on two different machines with different versions of NVHPC (21.11 and 22.1)
Compiling with the makefile I used for VASP 6.2.1 also results in the same problem (it worked fine for VASP 6.2.1)
This seems to be a bug introduced in VASP 6.3, can you confirm this?
/Daniel
Re: VASP 6.3 ACC OMP problem
Posted: Mon Feb 07, 2022 10:23 am
by alexey.tal
Hi,
Could you please provide the versions of the libraries and the input files, so that we can try to reproduce this issue.
Best regards,
Alexey
Re: VASP 6.3 ACC OMP problem
Posted: Wed Feb 09, 2022 12:41 am
by Dankomaister
Sure!
The VASP input seems to not matter since I get this problem on all the systems I have tested.
But I will attached the input files for one of them.
Regarding the versions of the libraries used these are:
CUDA/11.4.3
NVHPC/22.1
imkl/2021.4.0
OpenMPI/4.1.1
PMIx/4.1.0
UCX/1.11.2
UCX-CUDA/1.11.2
all installed using EasyBuild
/Daniel
Re: VASP 6.3 ACC OMP problem
Posted: Tue Feb 15, 2022 12:13 am
by Dankomaister
Any update on fixing this?
/Daniel
Re: VASP 6.3 ACC OMP problem
Posted: Tue Feb 15, 2022 2:27 pm
by alexey.tal
Hi Daniel,
Thank you for sending the files and the library versions.
I tested your calculation on our machines and I wasn't able to reproduce this issue. However, there is a somewhat similar report on the
forum. People from Nvidia confirmed that on their machines VASP 6.3 does run with OpenMP threads without any problems.
These type of issues are likely specific to the environment of your computer and aren't bugs. Unfortunately, it is very hard to tell what can be a potential problem in such cases. I would suggest that you check with the administrators of your computer if your job is correctly set up for GPUs and all the modules and variables are correctly loaded.
Re: VASP 6.3 ACC OMP problem
Posted: Wed Feb 16, 2022 1:20 am
by Dankomaister
Hi Alexey,
I saw that post on the forum yesterday which I also believe is the same problem as I have.
I also noticed that the user compiled VASP using compilers/libraries in an EasyBuild environment.
When you test this on your machine do you also use EasyBuild?
If not then I would suggest that you setup an EasyBuild environment and try to reproduce my results.
Since I am the administrator of our HPC I can provide the necessary files/instructions.
It is anyway a good idea for you guys to have an EasyBuild environment available.
Since EasyBuild is one of the most common (and practical) ways of reliably reproducing installations of scientific software on HPC clusters.
Regarding your suggestions about checking that our HPC is setup for GPUs, since I am the administrator of or HPC I do believe it is setup correctly
and anyhow I tested on a different HPC cluster and was able to reproduce my problem. As I also mentioned VASP 6.2.1 works fine using the same makefile, compilers and libraries. So this is most likely a bug that was introduced in VASP 6.3.
/Daniel
Re: VASP 6.3 ACC OMP problem
Posted: Thu Feb 17, 2022 8:33 am
by alexey.tal
I asked Martijn Marsman about this issue and he told me that he has also encountered this problem with NVHPC SDK 21.11. I see that you have tried 21.11 or newer 22.1, and as it is quite likely that this problem was introduced in 21.11, it hasn't been fixed yet in 22.1.
I tested your job with 21.2 and it worked fine.
Could you please check if you can reproduce this issue with an older version of NVHPC SDK?
Re: VASP 6.3 ACC OMP problem
Posted: Thu Feb 17, 2022 12:10 pm
by Dankomaister
Okay sounds promising!
I will try the different version of the NVIDIA HPC SDK and see what the results are!
/Daniel
Re: VASP 6.3 ACC OMP problem
Posted: Wed Feb 23, 2022 6:32 am
by Dankomaister
Okay! I have tested to compile VASP 6.3 with NVHPC version 22.2 but the problem still persists.
However version 21.2 is working! perhaps is a good idea to mention this in the wiki?
/Daniel