Page 1 of 1
VASP 6.2.1 sporadically hangs for hybrid calculations using multiple nodes
Posted: Tue Nov 30, 2021 3:25 pm
by guyohad
Dear VASP developers,
Here is a minimal calculation that causes VASP 6.2.1 to sporadically hang in the middle of an SCF iteration. The more nodes we use the more likely the calculation hangs (4 nodes, with a total of 96 mpi processes hangs 80% of the time). In the cases where it doesn't hang, it converges nicely. The calculation is a 2x2x2 supercell of GaAs with the k-grid consisting of just the gamma point using HSE and reading in a PBE WAVECAR as a starting point.
We think this is related to using hybrid functionals because we do not see this problem when using PBE. We have tried various intel compilers (including intel 2019) which change the percentage of calculations that hang, but never fully removes the problem and we have included our makefile.include in the zip file. Additionally, we have run the the test suite and found that all calculations passed successfully.
We appreciate any help identifying the source of this problem.
Sincerely,
Guy
Re: VASP 6.2.1 sporadically hangs for hybrid calculations using multiple nodes
Posted: Thu Dec 09, 2021 7:00 am
by henrique_miranda
Thank you for the bug report.
We are trying to reproduce this issue on our side. But it is unlikely that we will see it.
In the meantime, there are a couple of things that you could try that might help us try narrow down where the problem might be:
1. Try compiling the code with "-g -traceback -debug extended", run the code, kill it when it hangs up, and then post here the traceback?
2. Try compiling with openmpi and see if the problem persists?
Re: VASP 6.2.1 sporadically hangs for hybrid calculations using multiple nodes
Posted: Tue Dec 14, 2021 11:52 am
by guyohad
Hi Henrique,
I compiled with openmpi and I still get the same problem. Attached are the tracebacks for both the openmpi version and the mpi only version. As you can see, they hang in the same location. Are you able to reproduce the error?
Best,
Guy
Re: VASP 6.2.1 sporadically hangs for hybrid calculations using multiple nodes
Posted: Sat Dec 18, 2021 3:46 pm
by henrique_miranda
Thank you for the traceback.
This makes it clearer where the problem possibly is.
We have encountered issues when using some MPI versions with non-blocking communications.
There is a toggle in mpi.F you can try to uncomment and check if it solves your problem:
!#define MPI_avoid_bcast
Re: VASP 6.2.1 sporadically hangs for hybrid calculations using multiple nodes
Posted: Sun Dec 19, 2021 1:49 pm
by guyohad
Hi Henrique,
I uncommented #define MPI_avoid_bcast, however VASP still hangs 80% of the time. The traceback indicates that calculation gets stuck at the same location in the code. What's the next thing we can try?
Thanks,
Guy
Re: VASP 6.2.1 sporadically hangs for hybrid calculations using multiple nodes
Posted: Mon Dec 20, 2021 11:42 am
by andreas.singraber
Hello Guy,
I could reproduce the hang-ups for VASP 6.2.1 on a machine with 44 cores and with the latest Intel compiler (2021.4). It seems the problem is coming from non-blocking MPI communication for which we had similar issues before. In the past the "culprit" was MPI_Ibcast and we could even write a little reproducer code snippet which strongly indicates that there is a problem with the Intel compiler/MPI. In your case there seems to be a similar issue with MPI_Ireduce... anyway, the upcoming VASP version will avoid these calls by default (usually without loss of performance) and we will re-evaluate at a later time whether non-blocking global communication calls work reliably.
So, at this point I have two potential solutions for you:
(1) Either you wait a few more days until the upcoming release and try directly with the newest version of VASP,
(2) or you copy the attached mpi.F into your VASP 6.2.1 src directory and recompile the whole code with the Intel compiler.
In my case both options worked, I hope it will solve the issue for you too! Could you please test it and report back, thanks!
All the best,
Andreas Singraber
Re: VASP 6.2.1 sporadically hangs for hybrid calculations using multiple nodes
Posted: Tue Dec 21, 2021 2:39 pm
by guyohad
Thank you very much Andreas! The new mpi.F file fixed the problem and we even notice a ~10% speed up.
Best,
Guy
Re: VASP 6.2.1 sporadically hangs for hybrid calculations using multiple nodes
Posted: Tue Dec 21, 2021 3:00 pm
by andreas.singraber
Hi!
Great :-), thank you for reporting back!
Best,
Andreas